Or you might get right. Sorry for the confusion I created in the previous post. I added a couple of words to it, which are marked with bold in the text, and I hope it makes more sense now.
python-stablesort is an implementation of the previously featured timsort as a C extension module. This module provides a convenient way to use the algorithm in Python versions from 1.5.2 to 2.2.1.
Notice the caveats mentioned in the home page: Note: this package uses unstable source code from the Python CVS version, so there could be unexpected problems. I don't think there has been any reported bugs for the algorithm yet, though.
In short, module compiles to stablesort.so which you can use like this:
import stablesort l = [6, 4, 2, 1] stablesort.sort(l)
You can replace the list.sort() (for a specific module) with the stable sort implementation by
import stablesort stablesort.install()
For a naiive performance illustration let us provide the following. First, make the required imports and initialize lists:
from time import clock import stablesort N = 1000000 end_old = range(N) # two lists that are in sorted order but have end_new = range(N) # a nasty special case in the end of the list end_old.insert(N-5, 1) end_new.insert(N-5, 1) middle_old = range(N) # two lists that have a nasty special middle_new = range(N) # case in the middle of the list middle_old.insert(N/2, 1) middle_new.insert(N/2, 1)
The we have the timer() function:
def timer(func, *args): t1 = clock() func(*args) return clock()-t1
that we utilize as
txt = "[0 .. %s, 1, %s .. %s]" % ((N/2)-1, N/2, N) print 'sorted "special case in the end" with old sort in', timer(end_old.sort) print 'sorted "special case in the end" with new sort in', timer(stablesort.sort, end_new) print 'sorted', txt, 'with old sort in', timer(middle_old.sort) print 'sorted', txt, 'with new sort in', timer(stablesort.sort, middle_new)
that with N being exactly one million (1000000) yields
sorted "special case in the end" with old sort in 0.64 sorted "special case in the end" with new sort in 0.6 sorted [0 .. 499999, 1, 500000 .. 1000000] with old sort in 11.15 sorted [0 .. 499999, 1, 500000 .. 1000000] with new sort in 0.59
This is, of course, an extreme case -- sequences are rarely this odd --, but demonstrates the power of an adaptive algorithm. The first sequence type, with an inserted special case in the near end, the old sort manages as effectively as the new, since there is an special case for almost ordered and reverse ordered lists in the implementation; latter described in a comment in the listobject.c, too:
/* Check for the array already being reverse-sorted. Typical benchmark-driven silliness. */
The second sequence, that has an special case in the middle, is a hopeless case for the old sort and this is where the adaptive algorithm shows its strength. It is an extreme case, but still, it is assumed that the so called real world data tends to have some preordering in it, which the adaptive sort algorithm can exploit.
If you happen to have some real world data that you can easily use from Python, it would be interesting to see how that adaptive algorithm would behave, and I think the original author, Tim Peters, would gladly see such results. An example of such data would be some attribute of a record that is sorted by default by some other attribute.
If you happen to make some benchmarks, you can hook up with timbot (Tim Peters, that is) on comp.lang.python USENET newsgroup or the corresponding mailing list python-list, if he happens not to be busy at the moment. You can mail results to me, too, perhaps.
I mean, if 10 years from now, when you are doing something quick and dirty, you suddenly visualize that I am looking over your shoulders and say to yourself, "Dijkstra would not have liked this", well that would be enough immortality for me. -- E.W. Dijkstra.
According to an e-mail I received, one of the true great thinkers and doers in computer science, -- the humble programmer -- E.W. Dijkstra has died yesterday. This is sad news, rest in peace Edsger.
[I do not have any other more reliable source for the news than the e-mail; you'll have to confirm this by yourself, if need be.]
[update: Now it is in the Linux Weekly News, too.]