i had already tried that and if i leave even *one* TESTPROC in (nico,bitRAKE,Thomas2,etc), i still get that app error. thanks for all this time you're spending to help me :)
Strange? Must be something else?
thanks anyway :) i'll try to recode the testing procedure.
thanks anyway :) i'll try to recode the testing procedure.
It will be very interesting to see the performance characteristics of the algos on other processors. Seeing as how you have a P1 & P3 there... ;) Your welcome.
actually, i have the following processors in my house:
8086 (somewhere...)
286 (8MHz, i think) in a box
386 (several, all at 25 MHz) - plugged in
no 486s :(
P133
PPro150
P200MMX
PII300
P!!!667
much variety :)
oh... i also realised: i was using a file that was too small... it was over 2000000 bytes, but under 2MBs. about to run the test... :)
P!!! 667MHz (RAM:128 MBs, Win2K)
Nico :[B584D210], 2443 ms [100x2MB], 81.86 MB/s
BitRAKE :[B584D210], 3746 ms [100x2MB], 53.39 MB/s
BitRAKE2 :[B584D210], 3755 ms [100x2MB], 53.26 MB/s
Thomas2 :[B584D210], 3685 ms [100x2MB], 54.27 MB/s
Thomas3 :[B584D210], 3976 ms [100x2MB], 50.30 MB/s
Svin2 :[B584D210], 4046 ms [100x2MB], 49.43 MB/s
Nico2 :[B584D210], 2093 ms [100x2MB], 95.55 MB/s
Thomas3AndSvin :[B584D210], 1462 ms [100x2MB], 136.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 1412 ms [100x2MB], 141.64 MB/s
Pent 133MHz (RAM:48MBs, WinNT)
Nico :[B584D210], 10435 ms [100x2MB], 19.16 MB/s
BitRAKE :[B584D210], 20579 ms [100x2MB], 9.71 MB/s
BitRAKE2 :[B584D210], 20550 ms [100x2MB], 9.73 MB/s
Thomas2 :[B584D210], 13289 ms [100x2MB], 15.05 MB/s
Thomas3 :[B584D210], 6129 ms [100x2MB], 32.63 MB/s
Svin2 :[B584D210], 6088 ms [100x2MB], 32.85 MB/s
Nico2 :[B584D210], 8763 ms [100x2MB], 22.82 MB/s
Thomas3AndSvin :[B584D210], 5388 ms [100x2MB], 37.11 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 20.95 MB/s
PII 300MHz (RAM:128MBs, WinNT)
Nico :[B584D210], 10435 ms [100x2MB], 44.09 MB/s
BitRAKE :[B584D210], 20579 ms [100x2MB], 26.07 MB/s
BitRAKE2 :[B584D210], 20550 ms [100x2MB], 26.34 MB/s
Thomas2 :[B584D210], 13289 ms [100x2MB], 26.91 MB/s
Thomas3 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Svin2 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Nico2 :[B584D210], 8763 ms [100x2MB], 52.01 MB/s
Thomas3AndSvin :[B584D210], 5388 ms [100x2MB], 74.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 78.30 MB/s
PPro 150MHz (RAM:64MBs, Win95)
Nico :[B584D210], 14214 ms [100x2MB], 14.07 MB/s
BitRAKE :[B584D210], 27275 ms [100x2MB], 7.33 MB/s
BitRAKE2 :[B584D210], 25229 ms [100x2MB], 7.92 MB/s
Thomas2 :[B584D210], 26381 ms [100x2MB], 7.58 MB/s
Thomas3 :[B584D210], 34196 ms [100x2MB], 5.84 MB/s
Svin2 :[B584D210], 28674 ms [100x2MB], 6.97 MB/s
Nico2 :[B584D210], 10204 ms [100x2MB], 19.60 MB/s
Thomas3AndSvin :[B584D210], 9078 ms [100x2MB], 22.03 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 7745 ms [100x2MB], 25.82 MB/s
p.s. i hope you find this interesting :)Yes I found something interesting:
Svin2 :, 8192 ms [100x2MB], 24.41 MB/s
Nico2 :, 8763 ms [100x2MB], 52.01 MB/s
Very interesting math :)
According to it:
8192/8763=24.41/52.01
Svin2 :, 8192 ms [100x2MB], 24.41 MB/s
Nico2 :, 8763 ms [100x2MB], 52.01 MB/s
Very interesting math :)
According to it:
8192/8763=24.41/52.01
Yes I found something interesting:
Very interesting math :)
According to it:
8192/8763=24.41/52.01
sorry... this was the correct line:
Nico2 :, 3845 ms [100x2MB], 52.01 MB/s
(i copied it out of a message box into another file and re-transcribed again 'cuz i couldn't get it to output to console - and i copied and pasted the lines from the P133 table)
Do you have any test scripts? I guess there is no
practical way of doing 64/128 integer division?
Have not tested it, but it may be useful:
Posted on 2002-03-29 15:35:28 by bdjames
practical way of doing 64/128 integer division?
Have not tested it, but it may be useful:
Posted on 2002-03-29 15:35:28 by bdjames
Wow, it appears we work great together. :)
Thank you, jademtech.
Thank you, jademtech.
Tests for PMMX200.
Svin2 and Thomas3AndSvin identical here (I don't know why we need test procs with bugs so I just replaced it)
Some more comments on previous "pentium" tests.
movxz would work fine only from PPRO+
But ,bitRake, if I were you I would rearange instructions to
remove dependeces in main loop. It for sure increase speed I checked.
PPlain has very bad prediction algorithm it explains everything.
Svin2 and Thomas3AndSvin identical here (I don't know why we need test procs with bugs so I just replaced it)
eax = Nico : [A2EAC21F], 6630 ms [100x2MB], 30.16 MB/s
eax = BitRAKE : [A2EAC21F], 13900 ms [100x2MB], 14.38 MB/s
eax = BitRAKE2 : [A2EAC21F], 13769 ms [100x2MB], 14.52 MB/s
eax = Thomas2 : [A2EAC21F], 6069 ms [100x2MB], 32.95 MB/s
eax = Thomas3 : [A2EAC21F], 3736 ms [100x2MB], 53.53 MB/s
eax = Svin2 : [A2EAC21F], 3214 ms [100x2MB], 62.22 MB/s
eax = Nico2 : [A2EAC21F], 5318 ms [100x2MB], 37.60 MB/s
eax = Thomas3AndSvin : [A2EAC21F], 3214 ms [100x2MB], 62.22 MB/s
eax = Thomas3AndSvinAndBitRAKE : [A2EAC21F], 5798 ms [100x2MB], 34.49 MB/s
Some more comments on previous "pentium" tests.
movxz would work fine only from PPRO+
But ,bitRake, if I were you I would rearange instructions to
remove dependeces in main loop. It for sure increase speed I checked.
PPlain has very bad prediction algorithm it explains everything.
But ,bitRake, if I were you I would rearange instructions to
remove dependeces in main loop. It for sure increase speed I checked. PPlain has very bad prediction algorithm it explains everything.
What algo do you mean?
Thomas's "bitRAKE2" not equal bitRAKE's "bitRAKE2" :)
Posted on 2002-03-29 17:29:25 by bdjames
Hi,Thomas
Please, check for me if my proc is correct and fast for you..
Please, check for me if my proc is correct and fast for you..
GetAdler proc lpData:DWORD, DataSize:DWORD
push ebp
push ebx
push esi
xor edx, edx
mov ebx, 0FFFF000Fh
mov ebp, DataSize
lea eax, [edx+1+ebx]
mov esi, ebx
AdLoop:
cmp edx, ebp
je Ad_2
movzx ecx, byte ptr [lpData+edx]
add eax, ecx
jc Ad_1
lea ecx, [eax+0FFF1h]
inc edx
add ebx, ecx
jnc AdLoop
add ebx, esi
jmp AdLoop
Ad_1:
add ebx, eax
inc edx
lea eax, [eax+esi]
jnc AdLoop
add ebx, esi
jmp AdLoop
Ad_2:
sub ebx, esi
sub eax, esi
shl ebx, 16
pop esi
add eax, ebx
pop ebx
pop ebp
ret
GetAdler endp
Thomas, a typo in your code above executes bitRAKE algo twice and bitRAKE2 doesn't get executed.
bitRAKE4 is bitRAKE3 without the prefetch.
buliaNaza, I had to change this 'movzx ecx, byte ptr ' line to 'movzx ecx, byte ptr ' because it wouldn't work otherwise (lpData is a pointer, not the data).
bdjames, didn't work right?
Notes: DIV is fast enough on Athlon to make Svin2 = Thomas3. bitRAKE2 is slower than Thomas3, yet combined together Thomas3AndSvinAndBitRAKE is faster than all three.
Nico :[038217AA], 1262 ms [100x2MB], 158.47 MB/s
bitRAKE :[038217AA], 1603 ms [100x2MB], 124.76 MB/s
bitRAKE2 :[038217AA], 721 ms [100x2MB], 277.39 MB/s
Thomas2 :[038217AA], 1462 ms [100x2MB], 136.79 MB/s
Thomas3 :[038217AA], 711 ms [100x2MB], 281.29 MB/s
Nico2 :[038217AA], 1302 ms [100x2MB], 153.60 MB/s
Thomas3AndSvin (Svin2) :[038217AA], 711 ms [100x2MB], 281.29 MB/s
Thomas3AndSvinAndBitRAKE :[038217AA], 671 ms [100x2MB], 298.06 MB/s
BitRAKE3 :[038217AA], 230 ms [100x2MB], 869.56 MB/s
buliaNaza :[038217AA], 1843 ms [100x2MB], 108.51 MB/s
bdjames2 :[0000259C], 1643 ms [100x2MB], 121.72 MB/s
bitRAKE4 :[038217AA], 671 ms [100x2MB], 298.06 MB/s
This is the best of 10 runs of each algo on 1.333Ghz TB DDR.
bitRAKE4 is bitRAKE3 without the prefetch.
buliaNaza, I had to change this 'movzx ecx, byte ptr ' line to 'movzx ecx, byte ptr ' because it wouldn't work otherwise (lpData is a pointer, not the data).
bdjames, didn't work right?
Notes: DIV is fast enough on Athlon to make Svin2 = Thomas3. bitRAKE2 is slower than Thomas3, yet combined together Thomas3AndSvinAndBitRAKE is faster than all three.
Same post as above, from myself, but with BitRAKE3 executed on the PPro,PII,and P!!! (no PPlain). i might dig out my P200MMX some time to run it. There is a considerable difference between BitRAKE3 and the other speeds on the PPro. i wonder why that is.
P!!! 667MHz (RAM:128 MBs, Win2K)
Nico :[B584D210], 2443 ms [100x2MB], 81.86 MB/s
BitRAKE :[B584D210], 3746 ms [100x2MB], 53.39 MB/s
BitRAKE2 :[B584D210], 3755 ms [100x2MB], 53.26 MB/s
Thomas2 :[B584D210], 3685 ms [100x2MB], 54.27 MB/s
Thomas3 :[B584D210], 3976 ms [100x2MB], 50.30 MB/s
Svin2 :[B584D210], 4046 ms [100x2MB], 49.43 MB/s
Nico2 :[B584D210], 2093 ms [100x2MB], 95.55 MB/s
Thomas3AndSvin :[B584D210], 1462 ms [100x2MB], 136.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 1412 ms [100x2MB], 141.64 MB/s
BitRAKE3 :[B584D210], 1402 ms [100x2MB], 142.65 MB/s
PII 300MHz (RAM:128MBs, WinNT)
Nico :[B584D210], 10435 ms [100x2MB], 44.09 MB/s
BitRAKE :[B584D210], 20579 ms [100x2MB], 26.07 MB/s
BitRAKE2 :[B584D210], 20550 ms [100x2MB], 26.34 MB/s
Thomas2 :[B584D210], 13289 ms [100x2MB], 26.91 MB/s
Thomas3 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Svin2 :[B584D210], 8192 ms [100x2MB], 24.41 MB/s
Nico2 :[B584D210], 3845 ms [100x2MB], 52.01 MB/s
Thomas3AndSvin :[B584D210], 5388 ms [100x2MB], 74.79 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 78.30 MB/s
BitRAKE3 :[B584D210], 2663 ms [100x2MB], 75.10 MB/s
PPro 150MHz (RAM:64MBs, Win95)
Nico :[B584D210], 14214 ms [100x2MB], 14.07 MB/s
BitRAKE :[B584D210], 27275 ms [100x2MB], 7.33 MB/s
BitRAKE2 :[B584D210], 25229 ms [100x2MB], 7.92 MB/s
Thomas2 :[B584D210], 26381 ms [100x2MB], 7.58 MB/s
Thomas3 :[B584D210], 34196 ms [100x2MB], 5.84 MB/s
Svin2 :[B584D210], 28674 ms [100x2MB], 6.97 MB/s
Nico2 :[B584D210], 10204 ms [100x2MB], 19.60 MB/s
Thomas3AndSvin :[B584D210], 9078 ms [100x2MB], 22.03 MB/s
Thomas3AndSvinAndBitRake :[B584D210], 7745 ms [100x2MB], 25.82 MB/s
BitRAKE3 :[B584D210], 6002 ms [100x2MB], 33.32 MB/s
PII 300MHz (RAM:128MBs, WinNT)
Thomas3AndSvinAndBitRake :[B584D210], 9543 ms [100x2MB], 78.30 MB/s
BitRAKE3 :[B584D210], 2663 ms [100x2MB], 75.10 MB/s
These figures look wrong. :)
Thanks, jademtech - I was curious if the prefetch instructions would hender/improve performance on any other the other processors.
2554 ms... sorry. in case there are any more typos, here are the original datasets i collected for P!! and PPro
PPro
14214:14.07
27275:7.33
25229:7.92
26381:7.58
34196:5.84
28676:6.97
10204:19.60
9078:22.03
7745:25.82
6002:33.32
PII
4536:44.09
7671:26.07
7591:26.34
7431:26.91
8192:24.41
8192:24.41
3845:52.01
2674:74.79
2554:78.30
2663:75.10
PPro
14214:14.07
27275:7.33
25229:7.92
26381:7.58
34196:5.84
28676:6.97
10204:19.60
9078:22.03
7745:25.82
6002:33.32
PII
4536:44.09
7671:26.07
7591:26.34
7431:26.91
8192:24.41
8192:24.41
3845:52.01
2674:74.79
2554:78.30
2663:75.10
There is a considerable difference between BitRAKE3 and the other speeds on the PPro. i wonder why that is.
Options?:
A word-based lookup table adl(lookup+ebx)
Posted on 2002-03-30 02:44:24 by bdjames
A word-based lookup table adl(lookup+ebx)
Posted on 2002-03-30 02:44:24 by bdjames
Thomas, if I may, let me ask you again, adler proc must work only with 1mb aligned data or not?
I'm asking 'cause after multiple tests with real data (8 - 25 mb) if
the data unaligned by mb - defferent procs return defferent adlers.
I'm asking 'cause after multiple tests with real data (8 - 25 mb) if
the data unaligned by mb - defferent procs return defferent adlers.