Tuesday, July 15, 2014

Check for palindrome

In the Input and Output chapter of A Byte of Python, the code of a simple palindrome checker is presented. As homework exercise, we are required to improve it, so that it would accept as a palindrome something like "Rise to vote, sir." (courtesy of The Simpsons). That is, we have to skip punctuation and spaces, and not to care about case.

The main part of the code I have written gets a string as raw data from input, than is going to call a palindrome() function that returns True only when a palindrome is detected:
txt = raw_input('your input: ')
print 'This is',
if not palindrome(txt):
    print 'not',
print 'a palindrome'
Not much to say about it, I guess.

Following the Swaroop's suggestion to use a tuple to store the ignored characters and strip them from the original string in input, I wrote the palindrome() function in this way:
def palindrome(text):
    text = strip(text) # 1
    return text == text[::-1] # 2
1. A strip() function removes all the uninteresting characters from the original input.
2. I compare the stripped user input with its reverted representation, and return True only if they match. The string reverting is adequately commented by Swaroop. In brief, I create a copy of the original text by slicing, but saying that I want the entire interval to be considered. The third parameter says that I want to step negatively, and this result in reading the string form right to left.

And finally, here is how I have written my stripping function:
def strip(text):
    ignore = (',', '.', ' ') # 1

    stripped = '' # 2
    for c in text: # 3
        if c not in ignore: # 4
            stripped += c

    return stripped
1. This is the suggested ignore tuple. Actually, I put in it just three characters. Question mark, colon, semicolon, are just a few of the many candidates to enter in this selection.
2. The resulting string is initially empty.
3. Loop on all the characters in input.
4. If the current character is not in the ignore tuple I can add it to the output string.

This is it. Homework done. Still, the above #1 line it is quite painful. Can't we get rid of this manual initialization?

If we can relax a bit the original requirements, we could end up with a sleeker piece of code.
Say that is alright if we can happily get rid of any non-alphanumeric character. In this case we can use a very simple regular expression, and simplify the palindrome() function in this way:
import re # 1


def palindrome(text):
    text = re.sub(r'\W+', '', text) # 2
    return text == reverse(text)
1. I am using the python regular expression library.
2. I ask to re to substitute each non-alphanumeric subsequence in the original text with an empty one.

If you don't want to be so aggressive, you could write your own specific regular expression and strip just the characters you need to.