Since I Recently was told that we discussed too less about assembly on this forum, I decided to do something to try reverting this situation.
While I was working on some database stuff, I needed to implement a fast sorting routine and so I came across R22’s implementation http://www.asmcommunity.net/board/index.php?topic=24563.0 of the well known “Radix Sort Algorithm” for dwords. Reading a bit on the internet I didn’t find any implementation in assembler so I took R22’s code and the ideas I found here http://www.codercorner.com/RadixSortRevisited.htm and reworked the algorithms for dwords, sdwords, floats and for a list of pointers.
Timings are very promising and the idea behind the “Radix Sort” is really brilliant http://en.wikipedia.org/wiki/Radix_sort . Once you get the idea, you may adapt the code for your own needs. I commented the most important lines to facilitate the interpretation of the procedures.

Merry Christmas!  ;)

Posted on 2009-12-24 02:17:37 by Biterider
Yup, Radix Sort is an O(N) sorting algorithm.
It can be used on floating point quite easily, especially if you only have non-negative numbers. IEEE floating point numbers are constructed in such a way that integer comparisons are valid on them. And as such, masking part of the bits and doing a bucket sort, works aswell.
For negative numbers you need a small correction. They actually work in the 'opposite' direction from negative integers. So greater than turns into less than, and vice versa. This means that negative numbers will be sorted in reverse order... but if you just empty your buckets in reverse order aswell, you still end up with a sorted list of numbers.

I used this trick in my Java 3D engine to support fast per-poly sorting of geometry based on depth (useful for rendering without a z-buffer... for example with transparent/translucent objects).
Posted on 2009-12-24 02:32:20 by Scali
You are correct in this. The implemented procedures take care of negative floats as for negative dwords.
The trick is to analyze the sign change witch affects the last 128 counts. Changing the order you process the counts you can deal with these the special cases.

Interesting is the fact that the processing order for sdwords is NOT the same as for floats.

Once the algorithm is understood, extending it for (s)words or (s)qwords is quite easy.

Posted on 2009-12-24 02:42:37 by Biterider