Topcoder recently allowed the usage of Python in their single-round matches. On a side-note, I would recommend these to anyone who reached an intermediate level in programming and wants to stay sharp (Java, C++, C#, Python are now all possible). I haven’t done overly well there so far (don’t ask for my account name just yet), but I keep learning new approaches and they keep me on my toes.
Now, in the last match I participated, I thought I found a decent algorithm to solve the problem, but it was just not fast enough on Python. The problem is detailed here: http://community.topcoder.com/stat?c=problem_statement&pm=12967&rd=15840 (local copy: tc_srm607_prob2_div2)
You’re essentially given a string of up to 5000 characters and you should count the number of palindromic substrings (for instance ‘aaa’ has 6; 3x’a’, 2x’aa’ and ‘aaa’ itself) and all this has to happen in two seconds which is the main issue for Python. Many of the competitors failed because they tried to evaluate all the possible strings in for-loops which timed out even in faster programming languages.
I was a bit disappointed because the solution I had would come in at 3s in the worst case as I found after examining this further. I implemented the same thing in C++ where it was a lot faster and today, I tried how well it performs when you compile it using Cython.
My algorithm goes over the string from left to right and then starts to move outwards from any letter to see if there is palindrome with it in the center. I tested it by comparing to the naive implementation where I’m reasonably sure that it’s correct. Even if it isn’t, the speed argument still stands though.
The Cython as well as the main Python code is below. Note how little was changed between the count method in the main file and the count function in the cython file (lines 4-9 were annotated with types via cdef). I hadn’t used Cython before and put this together with only their tutorial from http://docs.cython.org/src/tutorial/cython_tutorial.html#the-basics-of-cython which says something about how convenient it is. Caveat: On Linux that is; On Windows where I started trying it, it requires the right Visual Studio version that matches your Python installation. This looked too tedious, so I skipped it.
Full sourcecode can be downloaded here: PalindromicSubstringsDiv2
#File: PalindromicSubstringsDiv2_cython.pyx def count( S1, S2): S_temp = ''.join(S1) + ''.join(S2) cdef char* S = S_temp cdef int i, k cdef int N = len(S) cdef int numPal = N for i in range(N): k=1 while(i+k<N and i-k>=0 and S[i+k]==S[i-k]): numPal +=1 k+=1 k=1 #left first while(i+k<N and i-k+1>=0 and S[i+k]==S[i-k+1]): numPal +=1 k+=1 return numPal
#File: PalindromicSubstringsDiv2.py (main file) import time import math import pyximport; pyximport.install() import PalindromicSubstringsDiv2_cython class Timer: def __enter__(self): self.start = time.clock() return self def __exit__(self, *args): self.end = time.clock() self.interval = self.end - self.start class PalindromicSubstringsDiv2: def checkPal(self, Sin): if len(Sin)==1: return True N = math.ceil(len(Sin) / 2.0) for i in range(int(N)): if not Sin[i]==Sin[-i-1]: return False return True def count_old(self, S1, S2): S = ''.join(S1) + ''.join(S2) numPal = 0 for iLen in range(1,len(S)+1): for i in range(len(S)-iLen+1): if self.checkPal(S[i:i+iLen]): numPal += 1 return numPal def count(self, S1, S2): S = ''.join(S1) + ''.join(S2) N = len(S) numPal = N for i in range(N): k=1 #check for odd strings with i in the center while(i+k<N and i-k>=0 and S[i+k]==S[i-k]): numPal +=1 k+=1 k=1 #check for even strings with i on the left of the double center while(i+k<N and i-k+1>=0 and S[i+k]==S[i-k+1]): numPal +=1 k+=1 return numPal if __name__ == '__main__': Plist = [ [, ["daata"]], [["top"], ["coder"]], [["zaz"], ], [["a", "a", ""], ["a"]], [["a"*20] * 10, ["a"*20] * 10], [["a"*40] * 10, ["a"*40] * 10], [["a"*50] * 50, ["a"*50] * 50], ] for pi in Plist: p0 = pi p1 = pi with Timer() as t1: cnt = PalindromicSubstringsDiv2_cython.count(p0, p1) print 'Cython - cnt: ' + str(cnt) + ', calculated in ' + str(t1.interval) with Timer() as t1: P= PalindromicSubstringsDiv2() cnt = P.count(p0, p1) print 'Python - cnt: ' + str(cnt) + ', calculated in ' + str(t1.interval) with Timer() as t1: P= PalindromicSubstringsDiv2() if len(p0) + len(p1) > 20: cnt = -1 else: cnt = P.count_old(p0, p1) print 'Naive - cnt: ' + str(cnt) + ', calculated in ' + str(t1.interval)
Results for the larger examples on the bottom are as follows:
Cython - cnt: 320400, calculated in 0.0 Python - cnt: 320400, calculated in 0.07 Naive - cnt: 320400, calculated in 4.82 Cython - cnt: 12502500, calculated in 0.03 Python - cnt: 12502500, calculated in 2.97 Naive - cnt: -1, calculated in 0.0
For reference, the C++ solution I programmed (see source code archive above) is actually slower than Cython here (0.11s vs 0.03s). Could be because Cython translates Python to C and then compiles it more efficiently than C++ (my bet is on Cython resulting in char* whereas I used string in C++) or because I programmed it in slightly different ways. So yes, I am deeply impressed by the speed that is possible with Cython!
However, even though I am very pleased, I’ll probably still go back to using C++ for Topcoder because they do not allow Cython at this moment.