i found http://www.asmcommunity.net/board/index.php?topic=1174.0 but there is no implementation, so i sat and wrote the following code:

speed test:

sine3:? 0.0000969 ms (~260 cycles)

sine2:

the returned value should be: 0,86602540378443864676372317075294? (sine(60 deg)). "difference' is the difference between this value and the returned value for each precision.

My observations: sine2 is about 33% faster even on 7th precision, while 5th is more than necessary.

What do you guys think about it? If noone will find any math mistake here, then I'll think about writing it in 3dnow or SSE.

-------------

the equation is from here:? http://mathworld.wolfram.com/images/equations/Sine/equation4.gif

the attached zip contains c++ source file and a ready-to-run executable.

/edit

BTW: the times in the exe show execution time of 10'000'000 iterations, so if you have slow CPU you have to wait a little :P

/edit 2

Made some tweaks and got a *little* better times.

/edit 3

Changed "degrees" to "radians" as suggested by Raymond

#define DIV0? (1.0/1.0)

#define DIV1 -(1.0/6.0)

#define DIV2? (1.0/120.0)

#define DIV3 -(1.0/5040.0)

#define DIV4? (1.0/362880.0)

#define DIV5 -(1.0/39916800.0)

#define DIV6? (1.0/6227020800.0)

#define DIV7 -(1.0/1307674368000.0)

const double divisor[]? = {DIV0, DIV1, DIV2, DIV3, DIV4, DIV5, DIV6, DIV7};

float sine3(float radians) {

? ? float temp;

? ?

? ? __asm {

? ? ? ? fld? ? ?dword ptr

? ? ? ? fsin

? ? ? ? fstp? ? dword ptr

? ? }

? ? return temp;

}

float sine2(float radian, long precision) {

? ? float sum;

? ? __asm {

? ? ? ? mov? ? ?eax,

? ? ? ? xor? ? ?ecx, ecx

? ? ? ? fld? ? ?dword ptr

? ? ? ? mov? ? ?edx, offset divisor

? ? ? ? and? ? ?eax, 7

? ? ? ? mov? ? ?, ecx

nxtpass:

? ? ? ? fld? ? ?st

? ? ? ? fmul? ? qword ptr

? ? ? ? fxch? ? st(1)

? ? ? ? fmul? ? dword ptr

? ? ? ? fxch? ? st(1)

? ? ? ? fadd? ? dword ptr

? ? ? ? fxch? ? st(1)

? ? ? ? fmul? ? dword ptr

? ? ? ? fxch? ? st(1)

? ? ? ? inc? ? ?ecx

? ? ? ? fstp? ?

? ? ? ? cmp? ? ?ecx, eax

? ? ? ? jbe? ? ?nxtpass

? ? ? ? fstp? ? dword ptr

? ? }

? ? return sum;

}

speed test:

sine3:? 0.0000969 ms (~260 cycles)

sine2:

`precision:? speed (ms):? ? ? ? ? ? ? % of 'fsin3' time:? value returned:? ? ? ? ?difference:`

0? ? ? ? ? ?0.0000093 (~25 cycles)? ? 9? ? ? ? ?? ? ? ? ?~1,04719758033752440? ? ~0,18117217655308575

1? ? ? ? ? ?0.0000172 (~46 cycles)? ?18? ? ? ? ?? ? ? ? ?~0,85580080747604370? ? ~0,01022459630839494

2? ? ? ? ? ?0.0000250 (~67 cycles)? ?26? ? ? ? ? ?? ? ? ?~0,86629533767700195? ? ~0,00026993389256330

3? ? ? ? ? ?0.0000313 (~84 cycles)? ?32? ? ? ? ? ?? ? ? ?~0,86602133512496948? ? ~0,00000406865946916

4? ? ? ? ? ?0.0000406 (~109 cycles)? 42? ? ? ? ? ?? ? ? ?~0,86602550745010376? ? ~0,00000010366566511

5? ? ? ? ? ?0.0000469 (~126 cycles)? 48? ? ? ? ? ? ?? ? ?~0,86602544784545898? ? ~0,00000004406102033

6? ? ? ? ? ?0.0000547 (~146 cycles)? 56? ? ? ? ? ? ?? ? ?~0,86602544784545898? ? ~0,00000004406102033

7? ? ? ? ? ?0.0000594 (~159 cycles)? 61? ? ? ? ? ? ? ?? ?~0,86602544784545898? ? ~0,00000004406102033

the returned value should be: 0,86602540378443864676372317075294? (sine(60 deg)). "difference' is the difference between this value and the returned value for each precision.

My observations: sine2 is about 33% faster even on 7th precision, while 5th is more than necessary.

What do you guys think about it? If noone will find any math mistake here, then I'll think about writing it in 3dnow or SSE.

-------------

the equation is from here:? http://mathworld.wolfram.com/images/equations/Sine/equation4.gif

the attached zip contains c++ source file and a ready-to-run executable.

/edit

BTW: the times in the exe show execution time of 10'000'000 iterations, so if you have slow CPU you have to wait a little :P

/edit 2

Made some tweaks and got a *little* better times.

/edit 3

Changed "degrees" to "radians" as suggested by Raymond

cute :)

Nice :) - I decided to see what speed one can get with a LUT and linear interpolation. The code below executes in 48 cycles, 0.05% error encountered. Automatically corrects the input, with no extra cycles penalty. (while the normal algo wants 0<x<pi/4)... Using the value of ECX to negate the output can also speed up the proc.

Accepts input parameter in ST, overwrites ST with its sinus. Needs 3 free fpu registers on first call, 2 free in next calls.

I haven't done thorough benchmarks, I'll leave it to you if you want :)

: Found a little speedup, from 51cycles became 48.

Accepts input parameter in ST, overwrites ST with its sinus. Needs 3 free fpu registers on first call, 2 free in next calls.

I haven't done thorough benchmarks, I'll leave it to you if you want :)

SIN_PRECISION equ 16384

.data

usin_f1 real4 ? ; sin_precision/pi

usin_f2 real4 0.5

sinLUT dd 0

.code

usin proc uses eax ebx ecx ; pass parameter in ST(0), in radians

local x,delta,temp1

mov ebx,sinLUT

test ebx,ebx

jz _makeLUT

_has_LUT:

;----[ make 0<= x <= pi ]-- ------\

fmul usin_f1

mov ecx,SIN_PRECISION ; is_negative

fld ST

fsub usin_f2 ; make truncate to floor

mov eax,SIN_PRECISION-1

fistp x

and eax,x

and ecx,x

fisub x

lea eax,

or ecx,ecx

fstp delta

;--------------------------------/

;------[ interpolate ]------------\

fld real4 ptr

fld real4 ptr

fsub st(0),st(1)

fmul delta

fsub

;---------------------------------/

jnz @F

ret

@@:

fchs

ret

; out = A*(1-delta) + B*delta = A - A*Delta +B*Delta = A - Delta*(B-A)

_makeLUT: ;---------------------------\

fstp x

push edx

invoke GetProcessHeap

invoke HeapAlloc,eax,0,SIN_PRECISION*4+4

mov sinLUT,eax

pop edx

mov ecx,SIN_PRECISION+1

mov temp1,ecx

fldpi

fidiv temp1

fldz

@@: ;----------------------\

fld ST

fsin

fstp real4 ptr

fadd st(0),st(1)

add eax,4

dec ecx

jnz @B

;--------------------------/

fdecstp

fdecstp

fild temp1

fldpi

fdiv

fstp usin_f1

fld x

mov ebx,sinLUT

jmp _has_LUT

;-------------------------------------/

usin endp

: Found a little speedup, from 51cycles became 48.

The first comment I have is that

I would agree that your algo may be faster than using

In order to really compete with the

Raymond

**fsin**and the formula you use need the angle in radians. Your use of**degrees**for the parameter name may be misleading for those who may not be familiar with the requirements.I would agree that your algo may be faster than using

**fsin**for your needs. However, as you have indicated indirectly, the expected precision is only as good as single-precision floats. If higher precision such as double-precision or extended double-precision is required, your algo may not be competitive with the**fsin**instruction.In order to really compete with the

**fsin**instruction, the code should also include instructions to reduce the angle at least below +/- pi/2 radians and include the time required for those instructions in the timing. Your posted code would become much slower with angles outside that range.Raymond

Thank you guys for the input ;)

Raymond:

You're right, of course.

I should've stated *clearly* that it's under development. I plan to implement the 'cosine' function and use the equation: sin(90_degrees+alpha_degrees) = cos(aplha_degrees) and so on, and find out how fast would that be.

Ultrano:

I'll test your code when I get some free time.

Raymond:

You're right, of course.

I should've stated *clearly* that it's under development. I plan to implement the 'cosine' function and use the equation: sin(90_degrees+alpha_degrees) = cos(aplha_degrees) and so on, and find out how fast would that be.

Ultrano:

I'll test your code when I get some free time.

I use a custom LUT organized like this: aligned 16 bytes sine,cos,sine,cos and reads this with one MOVAPS and rotates 2 points at a time in 3d space

if I find out in my innerloop I always scale these values, I can remove MULPS/DIVPS and make a prescaled LUT

if I find out in my innerloop I always scale these values, I can remove MULPS/DIVPS and make a prescaled LUT

and I can easily implement Ultranos idea, just by add a few opcodes, simultanously sine and cos

first mov two upper to lower in another .xmm reg

add these and store in memory

use mmx to subtract with the right constants, ala Donkeys floatingpoint trick, but in parallel, for making div 2 on floats

first mov two upper to lower in another .xmm reg

add these and store in memory

use mmx to subtract with the right constants, ala Donkeys floatingpoint trick, but in parallel, for making div 2 on floats

Why not just code the whole function with SSE2 if you want speed that's where your going to get it.

If its for a 3d rendering I don't think a system requirement of processor w/ SSE is that big of a deal.

Also all new processors will have SSE2/3 so might as well start coding for the future instead of using FPU. The Newton's Square Root algo in SSE was faster than FSQRT with double precision.

If you reply with the "Compatibility with older processors" card I have no pitty for you :P you can buy a p4 or compatible amd for like $100.

If its for a 3d rendering I don't think a system requirement of processor w/ SSE is that big of a deal.

Also all new processors will have SSE2/3 so might as well start coding for the future instead of using FPU. The Newton's Square Root algo in SSE was faster than FSQRT with double precision.

If you reply with the "Compatibility with older processors" card I have no pitty for you :P you can buy a p4 or compatible amd for like $100.

(22/7)/180 almost have the same value with sin(1). Maybe sin(x)=x*(22/7)/180.

realvampire,

sin(x) ~= x only for small values of x (x is in radians). This could be clearly be seen from maclaurin expansion of sine function (which was used in the implementation of the ti_mo_n's code)

In your example, sin(1) where 1 is in degrees is close to (22/7)/180, because if you convert 1 degrees to radians, you get pi/180. pi ~= 22/7 (this approximation is one of the most common approximations used in elementary school mathematics). Therefore the above sine approximation still holds.

However, as x becomes bigger, the approximation becomes less accurate because of the other terms in the expansion becomes more prominent.

sin(x) ~= x only for small values of x (x is in radians). This could be clearly be seen from maclaurin expansion of sine function (which was used in the implementation of the ti_mo_n's code)

In your example, sin(1) where 1 is in degrees is close to (22/7)/180, because if you convert 1 degrees to radians, you get pi/180. pi ~= 22/7 (this approximation is one of the most common approximations used in elementary school mathematics). Therefore the above sine approximation still holds.

However, as x becomes bigger, the approximation becomes less accurate because of the other terms in the expansion becomes more prominent.

realvampire,

sin(x) ~= x only for small values of x (x is in radians). This could be clearly be seen from maclaurin expansion of sine function (which was used in the implementation of the ti_mo_n's code)

In your example, sin(1) where 1 is in degrees is close to (22/7)/180, because if you convert 1 degrees to radians, you get pi/180. pi ~= 22/7 (this approximation is one of the most common approximations used in elementary school mathematics). Therefore the above sine approximation still holds.

However, as x becomes bigger, the approximation becomes less accurate because of the other terms in the expansion becomes more prominent.

Yes you are correct. It only accurate until 30 degree.

I guess then LUTs with linear interpolation win again :)

In 3D games, sin/cos precision is important, according to my observations. 0.5% error - and animation looks awful! Actually, the accuracy of my LUT approach can be boosted at least 10 fold if we use 3-point hermite interpolation instead of linear interp (15 cycles extra, iirc)

In 3D games, sin/cos precision is important, according to my observations. 0.5% error - and animation looks awful! Actually, the accuracy of my LUT approach can be boosted at least 10 fold if we use 3-point hermite interpolation instead of linear interp (15 cycles extra, iirc)

To the Ineffable All,

? ? ?Has anyone ever evaluated this method?? Ratch

http://www.bmath.net/bmath/halfstaff.html

? ? ?Has anyone ever evaluated this method?? Ratch

http://www.bmath.net/bmath/halfstaff.html

To the Ineffable All,

Has anyone ever evaluated this method? Ratch

http://www.bmath.net/bmath/halfstaff.html

It still use FPU for the calculation. My math function took 10 ms on 10 million loop for retrieving the sin or cos value. I never test this function since I dont know how to implement it, If I dont made mistake it is a macro.

realvampire,

http://www.bmath.net/bmath/halfstaff.html

It still use FPU for the calculation.

? ? ?Certainly it does, for floating point arithmetic and to build the tables.? But it does not use any trig functions of the FPU.? Someone really should evaluate this method.? It does square roots too.

Yes you are correct. It only accurate until 30 degree.

? ? ?How accurate is accurate?? Sindeg(30)=Sinrad(pi/6)=0.5 , whereas pi/6=.523598775598..., which is 4.7% too high.? Ratch

http://www.bmath.net/bmath/halfstaff.html

It still use FPU for the calculation.

? ? ?Certainly it does, for floating point arithmetic and to build the tables.? But it does not use any trig functions of the FPU.? Someone really should evaluate this method.? It does square roots too.

Yes you are correct. It only accurate until 30 degree.

? ? ?How accurate is accurate?? Sindeg(30)=Sinrad(pi/6)=0.5 , whereas pi/6=.523598775598..., which is 4.7% too high.? Ratch

the bmath.net site scares me. The guy seems like a religious lunatic O_o

He certainly is, I stupidly got into an email debate with him a few years ago (I was younger then ;) ) but as you'd expect its not possible to discuss things with someone who will quote the bible as being hard evidence.

It's interesting to me because he clearly is no complete idiot, but sure if all religious lunatics were idiots then they probably wouldn't be as well organised as they are, nor as big a headache.

It's interesting to me because he clearly is no complete idiot, but sure if all religious lunatics were idiots then they probably wouldn't be as well organised as they are, nor as big a headache.

Well, enough on the topic of religion - not allowed by board rules etc. Is his code good? :)

I was brushing up against the no religious discussion rule I admit, but my rant was meant to be referring to fanatics in general, not specifically religious ones. Though while I was typing it I had creationists in mind, hence the emphasises in my post, apologies.

As for his code, I don't know. When I was looking into it I was a bit paranoid about precision, I was doing some 3d work at the time and didn't realise that precision would make a whole load of a difference. I had emailed him originally to ask about the level of precision in his code but when I got embroiled in the debate I forgot about it :sad: .

As for his code, I don't know. When I was looking into it I was a bit paranoid about precision, I was doing some 3d work at the time and didn't realise that precision would make a whole load of a difference. I had emailed him originally to ask about the level of precision in his code but when I got embroiled in the debate I forgot about it :sad: .

f0dder,

Well, enough on the topic of religion - not allowed by board rules etc. Is his code good?

Relax, just mentioning that someone has a religious predisposition is not a religious topic. ?You did not discuss doctrine, or advocate any kind of worship of a deity.

His code works. ?Does that mean it is good? ?I analyzed his code a while back. ?Not documented very good. ?As mentioned earlier, he uses a look up table and does a linear interpolation to calculate in between values. ?He does something clever in that he uses the same table for the sine and cosine. ?The entry points for the table are 90? out of phase with each other, as is the sine and cosine. ?His method also incorporates a parallel table of difference values for each sine/cosine entry. ?He uses that difference table for interpolation. ?That is a lot of memory, since each entry for the sine/cosine and difference table is a quad word. ? I sent him a e-mail a while back suggesting that he use the incremental Taylor series instead of the linear interpolation method. ?He could get rid of the difference table and also achieve more accuracy. ?I never got an answer.

E?in,

When I was looking into it I was a bit paranoid about precision,.....

? ? Well, you know what the precision of a Real4 or Real8 word is. ?I think you really mean accuracy. ?Let's analyze his method. ?Since he uses linear interpolation, the worse accuracy will be where the function direction changes the fastest. ?On a sine curve, that is at 90? and 270?. ?So what is his accuracy close to 90?, say at bittian 256.5? ?The FPU FSIN value is 0.999995293809576. ?His bittian method gives 0.999990587641301. ?We can expect his values to be low at that location because the sine curves above the two points. ?At 270?, his values will be high. ?My 3 term Taylor series method gives 0.999995293805885. ?The differences between the "actual" value of his and my methods are ?-4.706168275636635E-6 and -3.691380534576183E-12 respectively. ?I hope this gives you a feeling for the accuracy. ?I am enclosing some code for the 3 term Taylor series method. ?This idea should really be timed to see if it is faster than the FPU. ?Ratch

Well, enough on the topic of religion - not allowed by board rules etc. Is his code good?

Relax, just mentioning that someone has a religious predisposition is not a religious topic. ?You did not discuss doctrine, or advocate any kind of worship of a deity.

His code works. ?Does that mean it is good? ?I analyzed his code a while back. ?Not documented very good. ?As mentioned earlier, he uses a look up table and does a linear interpolation to calculate in between values. ?He does something clever in that he uses the same table for the sine and cosine. ?The entry points for the table are 90? out of phase with each other, as is the sine and cosine. ?His method also incorporates a parallel table of difference values for each sine/cosine entry. ?He uses that difference table for interpolation. ?That is a lot of memory, since each entry for the sine/cosine and difference table is a quad word. ? I sent him a e-mail a while back suggesting that he use the incremental Taylor series instead of the linear interpolation method. ?He could get rid of the difference table and also achieve more accuracy. ?I never got an answer.

E?in,

When I was looking into it I was a bit paranoid about precision,.....

? ? Well, you know what the precision of a Real4 or Real8 word is. ?I think you really mean accuracy. ?Let's analyze his method. ?Since he uses linear interpolation, the worse accuracy will be where the function direction changes the fastest. ?On a sine curve, that is at 90? and 270?. ?So what is his accuracy close to 90?, say at bittian 256.5? ?The FPU FSIN value is 0.999995293809576. ?His bittian method gives 0.999990587641301. ?We can expect his values to be low at that location because the sine curves above the two points. ?At 270?, his values will be high. ?My 3 term Taylor series method gives 0.999995293805885. ?The differences between the "actual" value of his and my methods are ?-4.706168275636635E-6 and -3.691380534576183E-12 respectively. ?I hope this gives you a feeling for the accuracy. ?I am enclosing some code for the 3 term Taylor series method. ?This idea should really be timed to see if it is faster than the FPU. ?Ratch

;*******************************************************************************

; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?;

; Sin of bitians using bitian method ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ;

; Assumes bitian is already in st(0) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ;

; Uses three Taylor series terms ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ;

; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?;

;*******************************************************************************

BSIN3 MACRO

? FIST ? ? ? ;save the integer part of bitian in temporary storage

? FISUB ? ? ?;TOS contains fractional part of bitian

? MOV EAX,

? FMUL B2R ? ? ? ? ? ;convert from bitians to radians for 2nd Taylor series term

? SHL EAX,22 ? ? ? ? ;clear all but lower 10 bits

? FLD 0 ? ? ? ? ? ? ?;**duplicate TOS

? SHR EAX,19 ? ? ? ? ;restore EAX and multiply by 8 for index into table

? FMUL ;compute 2nd Taylor series term by multiplying by cos

? FADD ;add 1st term of Taylor series

? FXCH ? ? ? ? ? ? ? ;**prepare to compute 3rd term

? FMUL 0 ? ? ? ? ? ? ;**square TOS

? FMUL ;**multipy by sin

? FMUL ? ? ? ;**divide by 2 to complete 3rd term of Taylor series

? FSUB ? ? ? ? ? ? ? ;**complete sin by subtracting 3rd term of Taylor series

ENDM

;*******************************************************************************