I have just started to make a small raytraceras a tutorial for 3D, diffuse Phong shading works ok.

However when dealing with specular highlights i need to fast calculate pow(x,y) ie X raised to Y power (x^y) for those of us that have not learned under english school ...

To my surprise FPU lacks such an instruction, hmmm ....

So i understand i could do it via a e^x and then a logarithm...but this looks damn slow and unclear :( to me...

So any examples (in asm) ? and/or optimizations.

BTW. I want to make it a realtime raytracer so speed is of the essence however there will be some source code without optimizations for tutorial ... eh also i am using TASM as always ...

One last thing: i think i can do with real4 simple float precision...

Thanks all
Posted on 2002-12-30 01:09:57 by BogdanOntanu
Posted on 2002-12-30 01:25:08 by bitRAKE

Thanks man :) !!

One last thing: there once was a link arround here about a library with source that claimed it can do real4 floating operations and functions much faster than FPU in software... ?
Posted on 2002-12-30 01:31:05 by BogdanOntanu
http://www.bmath.net/bmath/index.html :)

Fixed point will be faster because we can make more assumptions, but it is harder to implement at that performance level, imho. MMX/SSE should be used as well because it is over twice as fast on some algorithms.
Posted on 2002-12-30 01:32:59 by bitRAKE
here is the way ln

e^(x * ln y)

and that will give you high speed calculations
try it

I have added here a math :

;st to power st1 pow(st,st1) = 2^(st*log2st1)
fPow2 MACRO ; 2^st, 98 clocks
sub esp,16
fist dword ptr [esp+12]
fstp tbyte ptr [esp]
fisub dword ptr [esp+12]
mov eax,[esp+12]
add [esp+8],eax
fld tbyte ptr [esp]
add esp,16

fPow MACRO ; st^st(1), 200 clocks

this code is not mine but I have took it from a math include file.
Posted on 2002-12-30 01:35:09 by amr
Thanks all

I see some TBYTE reference so i guess above optimizations (looks like Agner's) are for real10?
Do they work for real4 also...?

Still i wonder if real4 will be ok for a realtime raytracer ....
Posted on 2002-12-30 01:47:45 by BogdanOntanu
Tested those 2 versions below:

; calculates Y^X
; st=Y,st(1)=X
;fyl2x ;x*log2Y
;f2xm1 ;Y^x-1
;fld1 ;1,Y^x-1
;faddp ;Y^x

and this (bitrake):

fmul ; A
fld st ; A A
frndint ;*B A
fld1 ; 1 B A
fscale ;*C B A
fxch st(2) ; A B C
fsubp st(1),st ; D C
f2xm1 ; E C
fmul st,st(1) ; F C
fadd ; G

the only code above those macros is:

specular_factor real4 0.2
specular_exponent real4 2.01

; .....
fld [specular_exponent] ;X
fld [reflect_dot_p] ;Y

But i still get negative results even if both specular_exponent and reflect_dot_p are positive.
AFAIK reflect_dot_p is varing between [1.0 , 0.0]

I must be doing something wrong... but how do i get negative values for Y^X when both values are positive?

Posted on 2002-12-30 04:09:39 by BogdanOntanu
That thread is mainly about e^x :)
F2XM1 is limited to range -1 to +1.
Posted on 2002-12-30 05:07:17 by bitRAKE
yeah but this is y^x ?

fyl2x ;x*log2Y
f2xm1 ;Y^x-1
fld1 ;1,Y^x-1
faddp ;Y^x

and still i get negative values (also e>0 afaik ;) )
Posted on 2002-12-30 05:12:10 by BogdanOntanu
	fld	fpc(<0.5>)

fld fpc(<0.5>)
fyl2x ;x*log2Y
f2xm1 ;Y^x-1
fld1 ;1,Y^x-1
faddp st(1), st ;Y^x
Result = 0.7071067811865475728 okay

The other macro don't work. :(
Posted on 2002-12-30 05:23:45 by bitRAKE
Should Y, X have any special normalized ranges?

I mean i am sure the dot_product is in 0..1 range but the specular_exponent can be 25.67 :) or 0.1
Posted on 2002-12-30 05:44:00 by BogdanOntanu
F2XM1 is limited to range -1 to +1. Therefore, ABS(x*log2Y) <= 1.
Posted on 2002-12-30 05:53:23 by bitRAKE