To use the insertsort2 in QuickSort you have to use this COMPARE2 macro, and you have the working InserSort for the string case:

COMPARE2 macro item1,item2
LOCAL next,done
push edi
push ecx
push edx
push eax
mov ecx,item2
mov edi,item1
next: mov al,
inc edi
mov dl,
inc ecx
cmp al,dl
jne done
cmp al,0
jne next
done: pop eax
pop edx
pop ecx
pop edi
insertsort2 proc Arr:DWORD, lb:DWORD, ub:DWORD
push ebx
push esi
push edi
mov esi, Arr
mov ecx, 1
inc ub
mov ebx, ub
iStart: mov edi,
mov edx, ecx
inner: mov eax,
COMPARE2 eax, edi
jle wExit
mov , eax
dec edx
jz wExit
mov eax,
COMPARE2 eax, edi
jle wExit
mov , eax
dec edx
jnz inner
wExit: mov , edi
inc ecx
cmp ecx, ebx
jl iStart
pop edi
pop esi
pop ebx
insertsort2 endp

Now there is a test in QUICKSORT :
sub edi,esi
jle @exit

This is where insersort should be called like this:
sub edi,esi
cmp edi,10
invoke insertsort2,ebx,0,10
jle @exit

IT should work but when i test then it doesn't....:(
There is somewhere a little little bug in parameter calling or something and i'm working on
Help is welcome!!
Posted on 2002-06-24 08:34:40 by eisodur
Hi there all,

my last repost is now history this is the code to fill in QuickSort:

sub edi,esi
jle @exit

.if edi<10
mov eax,edi
push esi
shl esi,2
add esi,ebx
invoke insertsort2,esi,0,eax
pop esi
jmp @exit

Posted on 2002-06-25 08:43:46 by eisodur
I wrote a few notes on quicksort a while back (they were meant for Hugi, but I never finished them).

I thought some of you might find them interesting (though they are quite dull, and as mentioned not quite finished).
Posted on 2002-06-30 17:29:04 by Jibz
While googling on suffix sorts I stumbled upon this site... OK, OK, it's in icky C... get over it :tongue:

McIllroy's Source code... you know from Bell Labs

Seems he also uses the method of splitting up sorts into different algos (great minds).

BitRAKE or others should be able to asminize... err assemblize... no assemble-ate... dang where's my book I use to thesaurize words :grin:
Posted on 2002-07-01 14:45:20 by rafe
Lost my last post via a big t-boom.

Due to a need at work, I've been looking at variants of quicksort out the wazoo. And...

I hereby take back my last post.

At least on a penguin, the quick/insertion sort hybid is slower except for when the entries get large (n>10M) and the duplicates are also large (say ~4 dups on avg.). However, even then the hybrid is beaten by a quicksort variant that chains the swaps rather than swaps each pair (think bubble sort vs insertion sort). That variant is usually slower too.

bitRAKE, using your macros etc. my PARTITION code weighs in at 26 lines (vs 32)... smaller isn't always faster but given that we're dealing w/ pointers a direct approach is proving the better way so far.

done w/ testing for now (@work)... now on to suffix sorting.

let me gas -> win32asmize these & post later

One hundred times...
I will test before I post!
I will test before I post!
I will ....
Posted on 2002-07-09 11:57:09 by rafe
rafe, it is great to here someone is testing some sorting routines. What are you sorting? I was aiming for branch prediction and execution concurrency where possible when I coded the algo, but await your post with much interest.
Posted on 2002-07-09 12:21:54 by bitRAKE

Yes... OK... good question

Because I was trying to look at the sort code rather than the compare code (which is where most of the time may go... another issue entirely) I was testing on pointers to ints randomly generated to ensure a certian level of duplicate keys. In c:

a = rand() % (SIZE / dupfactor);
p1 = &a;

My ultimate need is to sort very long strings from one char up to ~230MB long. Most of the work is done in peguin-c (to keep my job) but I drop to asm to make sure I really really understand what's going on. Anywho, I was messing with these to get a grip on the trade-offs I'll be facing later.

You are so right! Branch prediction is key. Moving already touched pointers is cheap. & this is why the "chained swap" version or other fancy versions are usually slower... too many jxx's.

Execution concurancy is probobly where you'll get me tho... like I said above, not claiming faster just smaller.


PS: Learned some good asm tricks looking at your code. thanks. Now to convince them at work to use asm :D
Posted on 2002-07-09 12:45:38 by rafe
So, you'll be generating hash codes for the strings and then be sorting the hash codes? Need to look into creating hash keys that are indicative of the string weight, and then use a radix sort on the keys - can't be beat. :grin: And do it in ASM - that way you really keep your job - give them a C version for compatiblity, but the ASM version just because your that good. :grin:
Posted on 2002-07-09 13:03:58 by bitRAKE
1st draft at least will actually be using a hybrid radix/ternary quick sort as per

Manber & Meyers,
Bentley & Sedgewick,
Sadakane & Imai
Sadakane & Larsson papers.
(last 2 most relevant)

I really tried to use Sadakane's code (good stuff) but it breaks down on the huge size of the sorts. ~8G records with some very long strings for the 1st go. Much more later.

Hence, relearning the basics of qsort. I found that time spent on planning & practice pays off on big projects. yes it's playing around but coding directly is usually a big mistake if you don't play a little.

I finally got those post-doc biologists to program... some... yea it's in Perl & SQL but there's less of the spoon feeding vHLL code to them, so now I've got time to work on a real project. What do you know, pigs CAN fly... if you hurl them hard enuf :grin: :grin:

yes... i WILL be sneaking in some asm code (in external libs... thanks f0dder) but the stuff i present to the boss & the board can't be "lower" than c. these guys think that java is da' bomb & write much crap in it. it's taken them about a year to get to c & it'll take another decade to get them to asm. :gentle sobbing:
Posted on 2002-07-09 13:32:18 by rafe
I feel for you rafe - it's the same everywhere almost! They just did a huge project here in Java Script - that isn't a joke, unfortunately. If we were playing Jeapardy, you'd now ask: "How do you bring today's fastest network infrastructures to their knees?" And you'd get 1000 points. :grin: It's nice if you can get paid for doing what you enjoy doing. Once your familiar with the language the coding is easy, and it's easy to get trapped into jumping into that part of it. You really have to do much before jumping to the low-level. Sounds like you figured that out long ago. :grin: I haven't look into the specific implementations you have mentioned - I will though.
Posted on 2002-07-09 14:45:39 by bitRAKE
OK, OK! It takes me forever to get back to this board:grin: Sorry but this isn't a "c" board &...

Bias alert: My data is pointers to dwords. The dwords can take on almost all the 2^32 possible values but there is a heavy bias for 2^16 of those dword values.

I'm only did all of this testing because I have to insert code into the qsort for my own evil purposes.

A week or two ago I did about a gazillion tests on these, I've gotta say that the standard quicksort is crapola. Suitable for maybe a million records or so. If you're going to do a heap-o-sorting on real data go to the 3-way quicksort & then spend the cycles on choosing a reasonable pivot. Median of 9 recs works well.

With all of the testing that I've done you sometimes gain a few cycles by doing a quadratic sort (insertion or bubble or select) on the smallest partitions but only a few centisecs on 10M dwords. Enuf to do it but it's not where the biggest gains are. Just to whip dead Nellie some more... most gains are via the 3-way, then pivot choice, then trick it, then (distantly) quadratic on small parts.

If you're going to trick out a 3-way qsort in asm, may I recommend Bently/McIlroy's as a starting point. See the link above.

As promised, it's shorter... but who cares if it's not the fastest. Oh yea, turn's out to not be very far from compiled c. Translation: uncreative & sophmoric. Also note that I haven't had the time to test it in win32asm but weeks are turning into months &... Dispite all that it's still the fastest of the lame-o qsorts I bothered with.

; eax/edx = scratch & partition index
; esi = current lo & absoulte lo indices
; edi = current hi & absoulte hi indices
; ebx = partition
; This method requires that you have a sentinel
; value that is bigger than any possible key
; easy to do when sorting pointers.
cmp esi, edi
jnl @qexit
push edi
push esi
mov eax, esi
inc edi
dec edi
COMPARE edi, eax
jg @qhi ;For some odd reason moving the equals works quite well?!
inc esi
COMPARE edi, eax
jl @lo
; If you don't like sentinels then you must
; gard in this loop against edi growing past hi index
cmp esi, edi
jge @F
SWAP esi, edi
jmp @qlo
pop esi
SWAP esi, edi
push edi
dec edi
pop esi
pop edi
inc esi

I will need to get back to sorting later. If I can get the time (yea right) I'll make win32asm versions for all to heckle & scoff at.
Posted on 2002-07-24 18:40:55 by rafe
COMPARE edi, eax
jg @qhi ;For some odd reason moving the equals works quite well?!

Well the reason why it works quite well is that suppose you have to sort an array of 7 million strings, and further suppose you have 5000 same strings, then at some point QuickSort could call himself with the partition of these 5000 elements. When you do not swap these equal elements then esi or edi will wind up to the partition bounds and call a subpartition of 4999 elements. This is the worst performance for quickSort and nothing will happen for a long long time. (Actually quickSort is working O=(N2), quadratic).

But when you swap, like you did, then there is an equal partitioning of 2000 elements, Now you algo is O=(N log N).:)

I was sorting with this little bug some days ago and when i changed this little difference the algo went from taking an hour or so to 3 minutes. Really unbelievable !! Just remember STOP at equal elements and swap no matter what.

(By the way it is better/safer to choose the pivot in the middle when your array is pre-sorted in some way)
Posted on 2002-07-29 10:05:51 by eisodur
Here is the general forms of the macros for array of structures (not useful for large structures - better to sort pointers :)):

; ...
value DWORD ? ; can be WORD/BYTE as well
; ...

COMPARE MACRO cARRAY, cIndex1, cIndex2, cREG
push cIndex1
push cIndex2

imul cIndex1, cIndex1, SIZEOF MY_STRUCT
imul cIndex2, cIndex2, SIZEOF MY_STRUCT

mov cREG, [cARRAY + cIndex1].MY_STRUCT.value
cmp cREG, [cARRAY + cIndex2].MY_STRUCT.value

pop cIndex2
pop cIndex1

EXCHANGE MACRO eARRAY, eIndex1, eIndex2, exREG, eyREG


imul cIndex1, cIndex1, SIZEOF MY_STRUCT
imul cIndex2, cIndex2, SIZEOF MY_STRUCT

lea cIndex1, [eARRAY + cIndex1 + SIZEOF MY_STRUCT]
lea cIndex2, [eARRAY + cIndex2 + SIZEOF MY_STRUCT]

@@: mov al, [eIndex1 + exREG]
mov ah, [eIndex2 + exREG]
mov [eIndex2 + exREG], al
mov [eIndex1 + exREG], ah
inc exREG
jne @B

Don't blame me, someone asked for them. :)
Posted on 2003-09-14 11:10:43 by bitRAKE
Heh :)

I thought we finished this fine algo a year ago..... Still not good at reading the macro's but I didn't try that hard yet. Swapping pointers on the stack is the way to gofor me :grin:
Posted on 2003-09-19 08:55:58 by eisodur
Hay bitRake,
Do you think you could compare your proc with standard C function qsort() and compare results in speed?
Posted on 2003-09-22 17:29:07 by Mikky