Hi

I have just started to make a small raytraceras a tutorial for 3D, diffuse Phong shading works ok.

However when dealing with specular highlights i need to fast calculate pow(x,y) ie X raised to Y power (x^y) for those of us that have not learned under english school ...

To my surprise FPU lacks such an instruction, hmmm ....

So i understand i could do it via a e^x and then a logarithm...but this looks damn slow and unclear :( to me...

So any examples (in asm) ? and/or optimizations.

BTW. I want to make it a realtime raytracer so speed is of the essence however there will be some source code without optimizations for tutorial ... eh also i am using TASM as always ...

One last thing: i think i can do with real4 simple float precision...

Thanks all

Bogdan

I have just started to make a small raytraceras a tutorial for 3D, diffuse Phong shading works ok.

However when dealing with specular highlights i need to fast calculate pow(x,y) ie X raised to Y power (x^y) for those of us that have not learned under english school ...

To my surprise FPU lacks such an instruction, hmmm ....

So i understand i could do it via a e^x and then a logarithm...but this looks damn slow and unclear :( to me...

So any examples (in asm) ? and/or optimizations.

BTW. I want to make it a realtime raytracer so speed is of the essence however there will be some source code without optimizations for tutorial ... eh also i am using TASM as always ...

One last thing: i think i can do with real4 simple float precision...

Thanks all

Bogdan

Hehe

Thanks man :) !!

One last thing: there once was a link arround here about a library with source that claimed it can do real4 floating operations and functions much faster than FPU in software... ?

Thanks man :) !!

One last thing: there once was a link arround here about a library with source that claimed it can do real4 floating operations and functions much faster than FPU in software... ?

http://www.bmath.net/bmath/index.html :)

Fixed point will be faster because we can make more assumptions, but it is harder to implement at that performance level, imho. MMX/SSE should be used as well because it is over twice as fast on some algorithms.

Fixed point will be faster because we can make more assumptions, but it is harder to implement at that performance level, imho. MMX/SSE should be used as well because it is over twice as fast on some algorithms.

hi,

here is the way ln

e^(x * ln y)

and that will give you high speed calculations

try it

I have added here a math :

this code is not mine but I have took it from a math include file.

amr

here is the way ln

e^(x * ln y)

and that will give you high speed calculations

try it

I have added here a math :

```
```

;st to power st1 pow(st,st1) = 2^(st*log2st1)

fPow2 MACRO ; 2^st, 98 clocks

sub esp,16

fist dword ptr [esp+12]

fld1

fstp tbyte ptr [esp]

fisub dword ptr [esp+12]

mov eax,[esp+12]

add [esp+8],eax

f2xm1

fld1

fadd

fld tbyte ptr [esp]

fmul

add esp,16

EndM

fPow MACRO ; st^st(1), 200 clocks

fyl2x

fPow2

EndM

this code is not mine but I have took it from a math include file.

amr

Thanks all

I see some TBYTE reference so i guess above optimizations (looks like Agner's) are for real10?

Do they work for real4 also...?

Still i wonder if real4 will be ok for a realtime raytracer ....

I see some TBYTE reference so i guess above optimizations (looks like Agner's) are for real10?

Do they work for real4 also...?

Still i wonder if real4 will be ok for a realtime raytracer ....

Tested those 2 versions below:

and this (bitrake):

the only code above those macros is:

But i still get negative results even if both specular_exponent and reflect_dot_p are positive.

AFAIK reflect_dot_p is varing between [1.0 , 0.0]

I must be doing something wrong... but how do i get negative values for Y^X when both values are positive?

:stupid:

```
```

;-------------------------------

; calculates Y^X

; st=Y,st(1)=X

;-------------------------------

;fyl2x ;x*log2Y

;f2xm1 ;Y^x-1

;fld1 ;1,Y^x-1

;faddp ;Y^x

and this (bitrake):

```
```

fldl2e

fmul ; A

fld st ; A A

frndint ;*B A

fld1 ; 1 B A

fscale ;*C B A

fxch st(2) ; A B C

fsubp st(1),st ; D C

f2xm1 ; E C

fmul st,st(1) ; F C

fadd ; G

the only code above those macros is:

```
```

.data

specular_factor real4 0.2

specular_exponent real4 2.01

.code

; .....

fld [specular_exponent] ;X

fld [reflect_dot_p] ;Y

But i still get negative results even if both specular_exponent and reflect_dot_p are positive.

AFAIK reflect_dot_p is varing between [1.0 , 0.0]

I must be doing something wrong... but how do i get negative values for Y^X when both values are positive?

:stupid:

That thread is mainly about e^x :)

F2XM1 is limited to range -1 to +1.

F2XM1 is limited to range -1 to +1.

yeah but this is y^x ?

and still i get negative values (also e>0 afaik ;) )

```
```

fyl2x ;x*log2Y

f2xm1 ;Y^x-1

fld1 ;1,Y^x-1

faddp ;Y^x

and still i get negative values (also e>0 afaik ;) )

```
fld fpc(<0.5>)
```

fld fpc(<0.5>)

fyl2x ;x*log2Y

f2xm1 ;Y^x-1

fld1 ;1,Y^x-1

faddp st(1), st ;Y^x

Result = 0.7071067811865475728 okay
The other macro don't work. :(

Should Y, X have any special normalized ranges?

I mean i am sure the dot_product is in 0..1 range but the specular_exponent can be 25.67 :) or 0.1

...

I mean i am sure the dot_product is in 0..1 range but the specular_exponent can be 25.67 :) or 0.1

...

F2XM1 is limited to range -1 to +1. Therefore, ABS(x*log2Y) <= 1.