i had already tried that and if i leave even *one* TESTPROC in (nico,bitRAKE,Thomas2,etc), i still get that app error. thanks for all this time you're spending to help me :)
Strange? Must be something else?
Posted on 2002-03-29 13:37:15 by bitRAKE

Strange? Must be something else?


thanks anyway :) i'll try to recode the testing procedure.
Posted on 2002-03-29 13:39:00 by jademtech

thanks anyway :) i'll try to recode the testing procedure.
It will be very interesting to see the performance characteristics of the algos on other processors. Seeing as how you have a P1 & P3 there... ;) Your welcome.
Posted on 2002-03-29 13:53:10 by bitRAKE

It will be very interesting to see the performance characteristics of the algos on other processors. Seeing as how you have a P1 & P3 there... ;) Your welcome.


actually, i have the following processors in my house:
8086 (somewhere...)
286 (8MHz, i think) in a box
386 (several, all at 25 MHz) - plugged in
no 486s :(
P133
PPro150
P200MMX
PII300
P!!!667

much variety :)

oh... i also realised: i was using a file that was too small... it was over 2000000 bytes, but under 2MBs. about to run the test... :)
Posted on 2002-03-29 14:01:09 by jademtech
P!!! 667MHz (RAM:128 MBs, Win2K)

Nico :[B584D210], 2443 ms [100x2MB], 81.86 MB/s
BitRAKE :[B584D210], 3746 ms [100x2MB], 53.39 MB/s
BitRAKE2 :[B584D210], 3755 ms [100x2MB], 53.26 MB/s
Thomas2 :[B584D210], 3685 ms [100x2MB], 54.27 MB/s
Thomas3 :[B584D210], 3976 ms [100x2MB], 50.30 MB/s
Svin2 :[B584D210], 4046 ms [100x2MB], 49.43 MB/s
Nico2 :[B584D210], 2093 ms [100x2MB], 95.55 MB/s
Thomas3AndSvin :[B584D210], 1462 ms [100x2MB], 136.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 1412 ms [100x2MB], 141.64 MB/s

Pent 133MHz (RAM:48MBs, WinNT)
Nico :[B584D210], 10435 ms [100x2MB], 19.16 MB/s
BitRAKE :[B584D210], 20579 ms [100x2MB], 9.71 MB/s
BitRAKE2 :[B584D210], 20550 ms [100x2MB], 9.73 MB/s
Thomas2 :[B584D210], 13289 ms [100x2MB], 15.05 MB/s
Thomas3 :[B584D210], 6129 ms [100x2MB], 32.63 MB/s
Svin2 :[B584D210], 6088 ms [100x2MB], 32.85 MB/s
Nico2 :[B584D210], 8763 ms [100x2MB], 22.82 MB/s
Thomas3AndSvin :[B584D210], 5388 ms [100x2MB], 37.11 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 20.95 MB/s

PII 300MHz (RAM:128MBs, WinNT)
Nico :[B584D210], 10435 ms [100x2MB], 44.09 MB/s
BitRAKE :[B584D210], 20579 ms [100x2MB], 26.07 MB/s
BitRAKE2 :[B584D210], 20550 ms [100x2MB], 26.34 MB/s
Thomas2 :[B584D210], 13289 ms [100x2MB], 26.91 MB/s
Thomas3 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Svin2 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Nico2 :[B584D210], 8763 ms [100x2MB], 52.01 MB/s
Thomas3AndSvin :[B584D210], 5388 ms [100x2MB], 74.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 78.30 MB/s

PPro 150MHz (RAM:64MBs, Win95)
Nico :[B584D210], 14214 ms [100x2MB], 14.07 MB/s
BitRAKE :[B584D210], 27275 ms [100x2MB], 7.33 MB/s
BitRAKE2 :[B584D210], 25229 ms [100x2MB], 7.92 MB/s
Thomas2 :[B584D210], 26381 ms [100x2MB], 7.58 MB/s
Thomas3 :[B584D210], 34196 ms [100x2MB], 5.84 MB/s
Svin2 :[B584D210], 28674 ms [100x2MB], 6.97 MB/s
Nico2 :[B584D210], 10204 ms [100x2MB], 19.60 MB/s
Thomas3AndSvin :[B584D210], 9078 ms [100x2MB], 22.03 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 7745 ms [100x2MB], 25.82 MB/s
p.s. i hope you find this interesting :)
Posted on 2002-03-29 14:31:26 by jademtech
Yes I found something interesting:

Svin2 :, 8192 ms [100x2MB], 24.41 MB/s
Nico2 :, 8763 ms [100x2MB], 52.01 MB/s

Very interesting math :)
According to it:
8192/8763=24.41/52.01
Posted on 2002-03-29 15:23:22 by The Svin

Yes I found something interesting:

Very interesting math :)
According to it:
8192/8763=24.41/52.01


sorry... this was the correct line:
Nico2 :, 3845 ms [100x2MB], 52.01 MB/s

(i copied it out of a message box into another file and re-transcribed again 'cuz i couldn't get it to output to console - and i copied and pasted the lines from the P133 table)
Posted on 2002-03-29 15:33:04 by jademtech
Do you have any test scripts? I guess there is no
practical way of doing 64/128 integer division?

Have not tested it, but it may be useful:

Posted on 2002-03-29 15:35:28 by bdjames
Wow, it appears we work great together. :)
Thank you, jademtech.
Posted on 2002-03-29 15:52:34 by bitRAKE
Tests for PMMX200.
Svin2 and Thomas3AndSvin identical here (I don't know why we need test procs with bugs so I just replaced it)


eax = Nico : [A2EAC21F], 6630 ms [100x2MB], 30.16 MB/s
eax = BitRAKE : [A2EAC21F], 13900 ms [100x2MB], 14.38 MB/s
eax = BitRAKE2 : [A2EAC21F], 13769 ms [100x2MB], 14.52 MB/s
eax = Thomas2 : [A2EAC21F], 6069 ms [100x2MB], 32.95 MB/s
eax = Thomas3 : [A2EAC21F], 3736 ms [100x2MB], 53.53 MB/s
eax = Svin2 : [A2EAC21F], 3214 ms [100x2MB], 62.22 MB/s
eax = Nico2 : [A2EAC21F], 5318 ms [100x2MB], 37.60 MB/s
eax = Thomas3AndSvin : [A2EAC21F], 3214 ms [100x2MB], 62.22 MB/s
eax = Thomas3AndSvinAndBitRAKE : [A2EAC21F], 5798 ms [100x2MB], 34.49 MB/s

Some more comments on previous "pentium" tests.
movxz would work fine only from PPRO+
But ,bitRake, if I were you I would rearange instructions to
remove dependeces in main loop. It for sure increase speed I checked.
PPlain has very bad prediction algorithm it explains everything.
Posted on 2002-03-29 17:02:12 by The Svin

But ,bitRake, if I were you I would rearange instructions to
remove dependeces in main loop. It for sure increase speed I checked. PPlain has very bad prediction algorithm it explains everything.
Sorry, I don't understand - what dependeces?
What algo do you mean?
Thomas's "bitRAKE2" not equal bitRAKE's "bitRAKE2" :)
Posted on 2002-03-29 17:26:38 by bitRAKE
Posted on 2002-03-29 17:29:25 by bdjames
Hi,Thomas
Please, check for me if my proc is correct and fast for you..



GetAdler proc lpData:DWORD, DataSize:DWORD
push ebp
push ebx
push esi
xor edx, edx
mov ebx, 0FFFF000Fh
mov ebp, DataSize
lea eax, [edx+1+ebx]
mov esi, ebx
AdLoop:
cmp edx, ebp
je Ad_2
movzx ecx, byte ptr [lpData+edx]
add eax, ecx
jc Ad_1
lea ecx, [eax+0FFF1h]
inc edx
add ebx, ecx
jnc AdLoop
add ebx, esi
jmp AdLoop
Ad_1:
add ebx, eax
inc edx
lea eax, [eax+esi]
jnc AdLoop
add ebx, esi
jmp AdLoop
Ad_2:
sub ebx, esi
sub eax, esi
shl ebx, 16
pop esi
add eax, ebx
pop ebx
pop ebp
ret
GetAdler endp
Posted on 2002-03-29 18:17:24 by buliaNaza
Thomas, a typo in your code above executes bitRAKE algo twice and bitRAKE2 doesn't get executed.
Nico                     :[038217AA], 1262 ms [100x2MB], 158.47 MB/s

bitRAKE :[038217AA], 1603 ms [100x2MB], 124.76 MB/s
bitRAKE2 :[038217AA], 721 ms [100x2MB], 277.39 MB/s
Thomas2 :[038217AA], 1462 ms [100x2MB], 136.79 MB/s
Thomas3 :[038217AA], 711 ms [100x2MB], 281.29 MB/s
Nico2 :[038217AA], 1302 ms [100x2MB], 153.60 MB/s
Thomas3AndSvin (Svin2) :[038217AA], 711 ms [100x2MB], 281.29 MB/s
Thomas3AndSvinAndBitRAKE :[038217AA], 671 ms [100x2MB], 298.06 MB/s
BitRAKE3 :[038217AA], 230 ms [100x2MB], 869.56 MB/s
buliaNaza :[038217AA], 1843 ms [100x2MB], 108.51 MB/s
bdjames2 :[0000259C], 1643 ms [100x2MB], 121.72 MB/s
bitRAKE4 :[038217AA], 671 ms [100x2MB], 298.06 MB/s
This is the best of 10 runs of each algo on 1.333Ghz TB DDR.
bitRAKE4 is bitRAKE3 without the prefetch.

buliaNaza, I had to change this 'movzx ecx, byte ptr ' line to 'movzx ecx, byte ptr ' because it wouldn't work otherwise (lpData is a pointer, not the data).

bdjames, didn't work right?

Notes: DIV is fast enough on Athlon to make Svin2 = Thomas3. bitRAKE2 is slower than Thomas3, yet combined together Thomas3AndSvinAndBitRAKE is faster than all three.
Posted on 2002-03-29 20:09:12 by bitRAKE
Same post as above, from myself, but with BitRAKE3 executed on the PPro,PII,and P!!! (no PPlain). i might dig out my P200MMX some time to run it. There is a considerable difference between BitRAKE3 and the other speeds on the PPro. i wonder why that is.
P!!! 667MHz (RAM:128 MBs, Win2K)

Nico :[B584D210], 2443 ms [100x2MB], 81.86 MB/s
BitRAKE :[B584D210], 3746 ms [100x2MB], 53.39 MB/s
BitRAKE2 :[B584D210], 3755 ms [100x2MB], 53.26 MB/s
Thomas2 :[B584D210], 3685 ms [100x2MB], 54.27 MB/s
Thomas3 :[B584D210], 3976 ms [100x2MB], 50.30 MB/s
Svin2 :[B584D210], 4046 ms [100x2MB], 49.43 MB/s
Nico2 :[B584D210], 2093 ms [100x2MB], 95.55 MB/s
Thomas3AndSvin :[B584D210], 1462 ms [100x2MB], 136.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 1412 ms [100x2MB], 141.64 MB/s
BitRAKE3 :[B584D210], 1402 ms [100x2MB], 142.65 MB/s

PII 300MHz (RAM:128MBs, WinNT)
Nico :[B584D210], 10435 ms [100x2MB], 44.09 MB/s
BitRAKE :[B584D210], 20579 ms [100x2MB], 26.07 MB/s
BitRAKE2 :[B584D210], 20550 ms [100x2MB], 26.34 MB/s
Thomas2 :[B584D210], 13289 ms [100x2MB], 26.91 MB/s
Thomas3 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Svin2 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Nico2 :[B584D210], 3845 ms [100x2MB], 52.01 MB/s
Thomas3AndSvin :[B584D210], 5388 ms [100x2MB], 74.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 78.30 MB/s
BitRAKE3 :[B584D210], 2663 ms [100x2MB], 75.10 MB/s

PPro 150MHz (RAM:64MBs, Win95)
Nico :[B584D210], 14214 ms [100x2MB], 14.07 MB/s
BitRAKE :[B584D210], 27275 ms [100x2MB], 7.33 MB/s
BitRAKE2 :[B584D210], 25229 ms [100x2MB], 7.92 MB/s
Thomas2 :[B584D210], 26381 ms [100x2MB], 7.58 MB/s
Thomas3 :[B584D210], 34196 ms [100x2MB], 5.84 MB/s
Svin2 :[B584D210], 28674 ms [100x2MB], 6.97 MB/s
Nico2 :[B584D210], 10204 ms [100x2MB], 19.60 MB/s
Thomas3AndSvin :[B584D210], 9078 ms [100x2MB], 22.03 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 7745 ms [100x2MB], 25.82 MB/s
BitRAKE3 :[B584D210], 6002 ms [100x2MB], 33.32 MB/s
Posted on 2002-03-29 21:27:04 by jademtech
PII 300MHz	(RAM:128MBs, WinNT)

Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 78.30 MB/s
BitRAKE3 :[B584D210], 2663 ms [100x2MB], 75.10 MB/s
These figures look wrong. :)

Thanks, jademtech - I was curious if the prefetch instructions would hender/improve performance on any other the other processors.
Posted on 2002-03-29 21:37:17 by bitRAKE
2554 ms... sorry. in case there are any more typos, here are the original datasets i collected for P!! and PPro

PPro
14214:14.07
27275:7.33
25229:7.92
26381:7.58
34196:5.84
28676:6.97
10204:19.60
9078:22.03
7745:25.82
6002:33.32

PII
4536:44.09
7671:26.07
7591:26.34
7431:26.91
8192:24.41
8192:24.41
3845:52.01
2674:74.79
2554:78.30
2663:75.10
Posted on 2002-03-29 21:44:51 by jademtech

There is a considerable difference between BitRAKE3 and the other speeds on the PPro. i wonder why that is.
Prefetch is a beautiful thing and the PPro supports it. :)
Posted on 2002-03-29 22:03:01 by bitRAKE
Options?:

A word-based lookup table adl(lookup+ebx)

Posted on 2002-03-30 02:44:24 by bdjames
Thomas, if I may, let me ask you again, adler proc must work only with 1mb aligned data or not?
I'm asking 'cause after multiple tests with real data (8 - 25 mb) if
the data unaligned by mb - defferent procs return defferent adlers.
Posted on 2002-03-31 07:31:24 by The Svin