Thanks for getting me started, Ekted! :)
While this particular test might not be that important, nor especially hard to get performing well, it has prompted me to start work on a testbed, which should this time hopefully become rather easy to adapt for other tests (plugin based), more thorough, and perhaps even with a portable core.

Furthermore, it's interesting to see what can be done with a P4 vs. Athlon. I still hope somebody will provide nicely optimized Athlon code.
Posted on 2003-04-24 12:12:22 by f0dder
If anybody cares, here's a new yodel version. It has conformance testing and allows you to easily write your own test routines (DLL), without having to run through ALL the other timings. It's mostly uploaded as a vague hint of what is to come (I hope this will end up as nice flexible benchmarking that can compile on both windows+linux), and for me to be able to grab the source while at work tomorrow :)
Posted on 2003-04-24 17:55:58 by f0dder
Ekted,

Lets hope that some of the suggestions were useful to you in designing the code you were after. Don't take any notice of the nonsense that went on in here, its normal from one or two people here.

Hope you get a good result.

Regards,

hutch@movsd.com
Posted on 2003-04-24 19:38:58 by hutch--
<flame on>
I hope you're referring to yourself, hutch. All I tried to do was to be helpful. I've spent quite some time programming the test suite and getting people to run it across a wide range of hardware. Your insulting comments were rather uncalled for, and I believe I deserve an apology.
</flame off>

hutch, the floating-point ran rather well on athlon (bad on P4, surprise surprise :-), however there's a problem with it: all the other routines work fine in overflow situations, however the FP code treats the word load from the array as signed, and also when storing - I couldn't really think of any nice way to avoid this, my workaround code involves loading word, storing to dword, then loading to FPU - same way when storing. This totally ruined the instruction timings on all platforms :(, got any ideas?
Posted on 2003-04-25 02:37:04 by f0dder
f0dder,

There is a basis in the rules of the forum about arguments, if you want to start them, do it in the "Crusades" forum where you can rave away to your heart's content but remember this is a technical forum where starting an argument is not within the rules.

Feel free to post in the "Crusades" forum any old time, thats what its there for but try and keep your polemic out of where members post questions looking for ideas or help.

Regards,

hutch@movsd.com
Posted on 2003-04-25 03:12:55 by hutch--
as a solution to the original problem ( x * 91 ) i came up with:



int eax = 542;
int edx = eax;
int ecx = eax;

eax = eax << 6;
edx = edx << 5;
eax = eax + edx;
edx = ecx << 2;
eax = eax - edx;
eax = eax - ecx;


obviously its not assembly, but i don't have an assembler here... :)

but i imagine it might be slower because the original problem was with SHL .. anyway, just thought i add my 2c.
Posted on 2003-04-28 22:54:32 by abc123
abc123,

if you tweak the code to do the multiply by 92, f0dder has a thread to bench the different methods that have been suggested and he can probably give you a comparison on the PIV he is using.

Regards,

hutch@movsd.com
Posted on 2003-04-28 23:49:42 by hutch--
abc, we moved the thread here:
http://www.asmcommunity.net/board/index.php?topic=12817
(the focus will probably shift from the mul-by-92 to generic timing)
Posted on 2003-04-29 02:28:21 by f0dder