Thanks for getting me started, Ekted! :)
While this particular test might not be that important, nor especially hard to get performing well, it has prompted me to start work on a testbed, which should this time hopefully become rather easy to adapt for other tests (plugin based), more thorough, and perhaps even with a portable core.
Furthermore, it's interesting to see what can be done with a P4 vs. Athlon. I still hope somebody will provide nicely optimized Athlon code.
While this particular test might not be that important, nor especially hard to get performing well, it has prompted me to start work on a testbed, which should this time hopefully become rather easy to adapt for other tests (plugin based), more thorough, and perhaps even with a portable core.
Furthermore, it's interesting to see what can be done with a P4 vs. Athlon. I still hope somebody will provide nicely optimized Athlon code.
If anybody cares, here's a new yodel version. It has conformance testing and allows you to easily write your own test routines (DLL), without having to run through ALL the other timings. It's mostly uploaded as a vague hint of what is to come (I hope this will end up as nice flexible benchmarking that can compile on both windows+linux), and for me to be able to grab the source while at work tomorrow :)
Ekted,
Lets hope that some of the suggestions were useful to you in designing the code you were after. Don't take any notice of the nonsense that went on in here, its normal from one or two people here.
Hope you get a good result.
Regards,
hutch@movsd.com
Lets hope that some of the suggestions were useful to you in designing the code you were after. Don't take any notice of the nonsense that went on in here, its normal from one or two people here.
Hope you get a good result.
Regards,
hutch@movsd.com
<flame on>
I hope you're referring to yourself, hutch. All I tried to do was to be helpful. I've spent quite some time programming the test suite and getting people to run it across a wide range of hardware. Your insulting comments were rather uncalled for, and I believe I deserve an apology.
</flame off>
hutch, the floating-point ran rather well on athlon (bad on P4, surprise surprise :-), however there's a problem with it: all the other routines work fine in overflow situations, however the FP code treats the word load from the array as signed, and also when storing - I couldn't really think of any nice way to avoid this, my workaround code involves loading word, storing to dword, then loading to FPU - same way when storing. This totally ruined the instruction timings on all platforms :(, got any ideas?
I hope you're referring to yourself, hutch. All I tried to do was to be helpful. I've spent quite some time programming the test suite and getting people to run it across a wide range of hardware. Your insulting comments were rather uncalled for, and I believe I deserve an apology.
</flame off>
hutch, the floating-point ran rather well on athlon (bad on P4, surprise surprise :-), however there's a problem with it: all the other routines work fine in overflow situations, however the FP code treats the word load from the array as signed, and also when storing - I couldn't really think of any nice way to avoid this, my workaround code involves loading word, storing to dword, then loading to FPU - same way when storing. This totally ruined the instruction timings on all platforms :(, got any ideas?
f0dder,
There is a basis in the rules of the forum about arguments, if you want to start them, do it in the "Crusades" forum where you can rave away to your heart's content but remember this is a technical forum where starting an argument is not within the rules.
Feel free to post in the "Crusades" forum any old time, thats what its there for but try and keep your polemic out of where members post questions looking for ideas or help.
Regards,
hutch@movsd.com
There is a basis in the rules of the forum about arguments, if you want to start them, do it in the "Crusades" forum where you can rave away to your heart's content but remember this is a technical forum where starting an argument is not within the rules.
Feel free to post in the "Crusades" forum any old time, thats what its there for but try and keep your polemic out of where members post questions looking for ideas or help.
Regards,
hutch@movsd.com
as a solution to the original problem ( x * 91 ) i came up with:
obviously its not assembly, but i don't have an assembler here... :)
but i imagine it might be slower because the original problem was with SHL .. anyway, just thought i add my 2c.
int eax = 542;
int edx = eax;
int ecx = eax;
eax = eax << 6;
edx = edx << 5;
eax = eax + edx;
edx = ecx << 2;
eax = eax - edx;
eax = eax - ecx;
obviously its not assembly, but i don't have an assembler here... :)
but i imagine it might be slower because the original problem was with SHL .. anyway, just thought i add my 2c.
abc123,
if you tweak the code to do the multiply by 92, f0dder has a thread to bench the different methods that have been suggested and he can probably give you a comparison on the PIV he is using.
Regards,
hutch@movsd.com
if you tweak the code to do the multiply by 92, f0dder has a thread to bench the different methods that have been suggested and he can probably give you a comparison on the PIV he is using.
Regards,
hutch@movsd.com
abc, we moved the thread here:
http://www.asmcommunity.net/board/index.php?topic=12817
(the focus will probably shift from the mul-by-92 to generic timing)
http://www.asmcommunity.net/board/index.php?topic=12817
(the focus will probably shift from the mul-by-92 to generic timing)