I've started properly exploring the world of C++ and for the time being I'm playing around with the latest Intel C++ compiler to see what its like. Already its annoyed me :mad: .

I have a small loop which calculated the logs of values in an array. I ran the rprogram and everything froze so went back check the code, checked things in a debugger, nothing seems to be wrong. So tried running the program again and left it for a while. Sure enough it worked, there was no problem after all it just the log function in whatever library its using is painfully slow.

Mind you if I recompile with all P4 optimisations on then it compiles into some SSE2 code and uses a different library function which seem to run at an appropiate speed, but this is a crazy thing. When I replaced the library log function with my own
double inline asmLog(double x) {

double y;
_asm {
fld x
fstp y
return y;

then things run sufficiently fast when compiling without the P4 optimisations, but now if I decide to turn them back on I'll have to replace calls to my routine with the standad library one as my little one wouldn't be SSE friendly. (Though it'd be worth looking at the SSE log function to see what it actually does)

Anyway I don't mean this as an anti-c rant, I actually quite like the language, and the SSE2 code produced looked very good, I'm quite looking forward to exploring the vectored code the compiler can produce. But honestly, how badly written is that standard log routine. Librarys are the one thing I thought you could rely on for being efficient since so many programmers use them, thought thats probably just me being naive.
Posted on 2004-04-12 11:41:56 by Eóin
Could you post the source + compiler switches used? If intrinsics aren't enabled and it's generating a function call to do log(), then things are probably going to be pretty pretty slow...
Posted on 2004-04-12 12:21:45 by f0dder
The command line is
ICL /nologo /QIfist /Qipo_obj /W3 /Gr /GX /G7 /MD /Oi /QxMi /FD /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_MBCS" /c Flame.cpp

I'm also playing with the umdev editor and these are the switchs it generates. I'm still in the process of learning them myself from the docs.

The loop is fairly simple
for(i=0; i<entries; i++) {		

mainBuffer[i].alpha = log(mainBuffer[i].alpha);

And it get compiled to
00401199  |> DD07           FLD QWORD PTR DS:[EDI]

0040119B |. 83C4 F8 ADD ESP,-8
0040119E |. DD1C24 FSTP QWORD PTR SS:[ESP]
004011A1 |. E8 7A020000 CALL <JMP.&libmmd.log>
004011A6 |. 8B55 C0 MOV EDX,DWORD PTR SS:[EBP-40]
004011AB |. 83C4 08 ADD ESP,8
004011AE |. 83C7 10 ADD EDI,10
004011B1 |. 83C6 01 ADD ESI,1
004011B4 |. 3BF2 CMP ESI,EDX
004011B6 |.^7C E1 JL SHORT TestProj.00401199

For SSE it'd probably have to generate some sort of function call, obviously there would be overhead but in the log function I wrote things are still sufficiently fast even when its not inlined. (sufficient as in I click a button and things seems to be instant, I haven't done any timings). Also sin and cos don't generate function calls, just log.
Posted on 2004-04-12 12:44:44 by Eóin
My appologies for jumping to conclusions but things were slow because I was calculating logs of zeros which you shouldn't do.... :rolleyes:

When I add a constant to ensure no zeros things are fast again. Still it shouldn't really be that slow or use a function when theres an FPU instruction which does the job faster.
Posted on 2004-04-12 12:58:34 by Eóin
hm, perhaps you have to explicitly enable intrinsic version of log? The Microsoft compiler supports intrinsics for Interchanged*, but you have to manually enable it (even when you enable intrinsics on the commandline) - perhaps intel has the same for log? check the manual. Or perhaps, for some weird reason, there's no intrinsic and the library version sucks? :p
Posted on 2004-04-12 13:54:32 by f0dder