i found http://www.asmcommunity.net/board/index.php?topic=1174.0 but there is no implementation, so i sat and wrote the following code:



#define DIV0? (1.0/1.0)
#define DIV1 -(1.0/6.0)
#define DIV2? (1.0/120.0)
#define DIV3 -(1.0/5040.0)
#define DIV4? (1.0/362880.0)
#define DIV5 -(1.0/39916800.0)
#define DIV6? (1.0/6227020800.0)
#define DIV7 -(1.0/1307674368000.0)

const double divisor[]? = {DIV0, DIV1, DIV2, DIV3, DIV4, DIV5, DIV6, DIV7};

float sine3(float radians) {
? ? float temp;
? ?
? ? __asm {
? ? ? ? fld? ? ?dword ptr
? ? ? ? fsin
? ? ? ? fstp? ? dword ptr
? ? }
? ? return temp;
}

float sine2(float radian, long precision) {
? ? float sum;

? ? __asm {
? ? ? ? mov? ? ?eax,
? ? ? ? xor? ? ?ecx, ecx
? ? ? ? fld? ? ?dword ptr
? ? ? ? mov? ? ?edx, offset divisor
? ? ? ? and? ? ?eax, 7
? ? ? ? mov? ? ?, ecx
nxtpass:
? ? ? ? fld? ? ?st
? ? ? ? fmul? ? qword ptr
? ? ? ? fxch? ? st(1)
? ? ? ? fmul? ? dword ptr
? ? ? ? fxch? ? st(1)
? ? ? ? fadd? ? dword ptr
? ? ? ? fxch? ? st(1)
? ? ? ? fmul? ? dword ptr
? ? ? ? fxch? ? st(1)
? ? ? ? inc? ? ?ecx
? ? ? ? fstp? ?
? ? ? ? cmp? ? ?ecx, eax
? ? ? ? jbe? ? ?nxtpass

? ? ? ? fstp? ? dword ptr
? ? }
? ? return sum;
}


speed test:
sine3:? 0.0000969 ms (~260 cycles)

sine2:
precision:? speed (ms):? ? ? ? ? ? ? % of 'fsin3' time:? value returned:? ? ? ? ?difference:
0? ? ? ? ? ?0.0000093 (~25 cycles)? ? 9? ? ? ? ?? ? ? ? ?~1,04719758033752440? ? ~0,18117217655308575
1? ? ? ? ? ?0.0000172 (~46 cycles)? ?18? ? ? ? ?? ? ? ? ?~0,85580080747604370? ? ~0,01022459630839494
2? ? ? ? ? ?0.0000250 (~67 cycles)? ?26? ? ? ? ? ?? ? ? ?~0,86629533767700195? ? ~0,00026993389256330
3? ? ? ? ? ?0.0000313 (~84 cycles)? ?32? ? ? ? ? ?? ? ? ?~0,86602133512496948? ? ~0,00000406865946916
4? ? ? ? ? ?0.0000406 (~109 cycles)? 42? ? ? ? ? ?? ? ? ?~0,86602550745010376? ? ~0,00000010366566511
5? ? ? ? ? ?0.0000469 (~126 cycles)? 48? ? ? ? ? ? ?? ? ?~0,86602544784545898? ? ~0,00000004406102033
6? ? ? ? ? ?0.0000547 (~146 cycles)? 56? ? ? ? ? ? ?? ? ?~0,86602544784545898? ? ~0,00000004406102033
7? ? ? ? ? ?0.0000594 (~159 cycles)? 61? ? ? ? ? ? ? ?? ?~0,86602544784545898? ? ~0,00000004406102033


the returned value should be: 0,86602540378443864676372317075294? (sine(60 deg)). "difference' is the difference between this value and the returned value for each precision.

My observations: sine2 is about 33% faster even on 7th precision, while 5th is more than necessary.

What do you guys think about it? If noone will find any math mistake here, then I'll think about writing it in 3dnow or SSE.

-------------
the equation is from here:? http://mathworld.wolfram.com/images/equations/Sine/equation4.gif

the attached zip contains c++ source file and a ready-to-run executable.



/edit
BTW: the times in the exe show execution time of 10'000'000 iterations, so if you have slow CPU you have to wait a little :P

/edit 2
Made some tweaks and got a *little* better times.

/edit 3
Changed "degrees" to "radians" as suggested by Raymond
Attachments:
Posted on 2005-07-08 16:50:53 by ti_mo_n
cute :)
Posted on 2005-07-08 17:22:41 by f0dder
Nice :)  - I decided to see what speed one can get with a LUT and linear interpolation. The code below executes in 48 cycles, 0.05% error encountered. Automatically corrects the input, with no extra cycles penalty. (while the normal algo wants 0<x<pi/4)... Using the value of ECX to negate the output can also speed up the proc.
Accepts input parameter in ST, overwrites ST with its sinus. Needs 3 free fpu registers on first call, 2 free in next calls.
I haven't done thorough benchmarks, I'll leave it to you if you want :)


SIN_PRECISION equ 16384

.data
usin_f1 real4 ? ; sin_precision/pi
usin_f2 real4 0.5
sinLUT dd 0
.code

usin proc uses eax ebx ecx ; pass parameter in ST(0), in radians
local x,delta,temp1
mov ebx,sinLUT
test ebx,ebx
jz _makeLUT
    _has_LUT:
;----[ make 0<= x <= pi ]-- ------\
fmul usin_f1
mov ecx,SIN_PRECISION ; is_negative
fld ST
fsub usin_f2 ; make truncate to floor
mov eax,SIN_PRECISION-1
fistp x
and eax,x
and ecx,x
fisub x
lea eax,
or ecx,ecx
fstp delta
;--------------------------------/
;------[ interpolate ]------------\
fld real4 ptr
fld real4 ptr
fsub st(0),st(1)
fmul delta
fsub
;---------------------------------/
jnz @F
ret
@@:
fchs
ret

; out = A*(1-delta) + B*delta = A - A*Delta +B*Delta = A - Delta*(B-A)

_makeLUT: ;---------------------------\
fstp x
push edx
invoke GetProcessHeap
invoke HeapAlloc,eax,0,SIN_PRECISION*4+4
mov sinLUT,eax
pop edx

mov ecx,SIN_PRECISION+1
mov temp1,ecx
fldpi
fidiv temp1
fldz
@@: ;----------------------\
fld ST
fsin
fstp real4 ptr
fadd st(0),st(1)
add eax,4
    dec ecx
    jnz @B
;--------------------------/
fdecstp
fdecstp

fild temp1
fldpi
fdiv
fstp usin_f1

fld x

mov ebx,sinLUT
jmp _has_LUT
;-------------------------------------/

usin endp



: Found a little speedup, from 51cycles became 48.
Posted on 2005-07-08 20:21:47 by Ultrano
The first comment I have is that fsin and the formula you use need the angle in radians. Your use of degrees for the parameter name may be misleading for those who may not be familiar with the requirements.

I would agree that your algo may be faster than using fsin for your needs. However, as you have indicated indirectly, the expected precision is only as good as single-precision floats. If higher precision such as double-precision or extended double-precision is required, your algo may not be competitive with the fsin instruction.

In order to really compete with the fsin instruction, the code should also include instructions to reduce the angle at least below +/- pi/2 radians and include the time required for those instructions in the timing. Your posted code would become much slower with angles outside that range.

Raymond
Posted on 2005-07-08 21:28:48 by Raymond
Thank you guys for the input ;)

Raymond:

You're right, of course.

I should've stated *clearly* that it's under development. I plan to implement the 'cosine' function and use the equation:  sin(90_degrees+alpha_degrees) = cos(aplha_degrees)  and so on, and find out how fast would that be.

Ultrano:

I'll test your code when I get some free time.
Posted on 2005-07-08 22:24:42 by ti_mo_n
I use a custom LUT organized like this: aligned 16 bytes sine,cos,sine,cos and reads this with one MOVAPS and rotates 2 points at a time in 3d space

if I find out in my innerloop I always scale these values, I can remove MULPS/DIVPS and make a prescaled LUT
Posted on 2005-07-09 01:54:37 by daydreamer
and I can easily implement Ultranos idea, just by add a few opcodes, simultanously sine and cos
first mov two upper to lower in another .xmm reg
add these and store in memory
use mmx to subtract with the right constants, ala Donkeys floatingpoint trick, but in parallel, for making  div 2 on floats




Posted on 2005-07-09 04:09:20 by daydreamer
Why not just code the whole function with SSE2 if you want speed that's where your going to get it.

If its for a 3d rendering I don't think a system requirement of processor w/ SSE is that big of a deal.

Also all new processors will have SSE2/3 so might as well start coding for the future instead of using FPU.  The Newton's Square Root algo in SSE was faster than FSQRT with double precision.

If you reply with the "Compatibility with older processors" card I have no pitty for you :P you can buy a p4 or compatible amd for like $100.
Posted on 2005-07-09 15:15:00 by r22
(22/7)/180 almost have the same value with sin(1). Maybe sin(x)=x*(22/7)/180.
Posted on 2005-07-29 12:32:19 by realvampire
realvampire,

sin(x) ~= x only for small values of x (x is in radians). This could be clearly be seen from maclaurin expansion of sine function (which was used in the implementation of the ti_mo_n's code)

In your example, sin(1) where 1 is in degrees is close to (22/7)/180, because if you convert 1 degrees to radians, you get pi/180. pi ~= 22/7 (this approximation is one of the most common approximations used in elementary school mathematics). Therefore the above sine approximation still holds.

However, as x becomes bigger, the approximation becomes less accurate because of the other terms in the expansion becomes more prominent.
Posted on 2005-07-30 06:54:51 by roticv

realvampire,

sin(x) ~= x only for small values of x (x is in radians). This could be clearly be seen from maclaurin expansion of sine function (which was used in the implementation of the ti_mo_n's code)

In your example, sin(1) where 1 is in degrees is close to (22/7)/180, because if you convert 1 degrees to radians, you get pi/180. pi ~= 22/7 (this approximation is one of the most common approximations used in elementary school mathematics). Therefore the above sine approximation still holds.

However, as x becomes bigger, the approximation becomes less accurate because of the other terms in the expansion becomes more prominent.


Yes you are correct. It only accurate until 30 degree.
Posted on 2005-07-30 14:05:23 by realvampire
I guess then LUTs with linear interpolation win again :)
In 3D games, sin/cos precision is important, according to my observations. 0.5% error - and animation looks awful! Actually, the accuracy of my LUT approach can be boosted at least 10 fold if we use 3-point hermite interpolation instead of linear interp (15 cycles extra, iirc)
Posted on 2005-07-30 14:55:53 by Ultrano
To the Ineffable All,
? ? ?Has anyone ever evaluated this method?? Ratch

http://www.bmath.net/bmath/halfstaff.html
Posted on 2005-08-03 23:12:23 by Ratch

To the Ineffable All,
    Has anyone ever evaluated this method?  Ratch

http://www.bmath.net/bmath/halfstaff.html


It still use FPU for the calculation. My math function took 10 ms on 10 million loop for retrieving the sin or cos value. I never test this function since I dont know how to implement it, If I dont made mistake it is a macro.
Posted on 2005-08-05 07:24:14 by realvampire
realvampire,

http://www.bmath.net/bmath/halfstaff.html


It still use FPU for the calculation.


? ? ?Certainly it does, for floating point arithmetic and to build the tables.? But it does not use any trig functions of the FPU.? Someone really should evaluate this method.? It does square roots too.


Yes you are correct. It only accurate until 30 degree.


? ? ?How accurate is accurate?? Sindeg(30)=Sinrad(pi/6)=0.5 , whereas pi/6=.523598775598..., which is 4.7% too high.? Ratch
Posted on 2005-08-05 08:32:58 by Ratch
the bmath.net site scares me. The guy seems like a religious lunatic O_o
Posted on 2005-08-05 09:42:05 by f0dder
He certainly is, I stupidly got into an email debate with him a few years ago (I was younger then ;) ) but as you'd expect its not possible to discuss things with someone who will quote the bible as being hard evidence.

It's interesting to me because he clearly is no complete idiot, but sure if all religious lunatics were idiots then they probably wouldn't be as well organised as they are, nor as big a headache.
Posted on 2005-08-05 10:29:10 by Eóin
Well, enough on the topic of religion - not allowed by board rules etc. Is his code good? :)
Posted on 2005-08-05 12:48:57 by f0dder
I was brushing up against the no religious discussion rule I admit, but my rant was meant to be referring to fanatics in general, not specifically religious ones. Though while I was typing it I had creationists in mind, hence the emphasises in my post, apologies.

As for his code, I don't know. When I was looking into it I was a bit paranoid about precision, I was doing some 3d work at the time and didn't realise that precision would make a whole load of a difference. I had emailed him originally to ask about the level of precision in his code but when I got embroiled in the debate I forgot about it :sad: .
Posted on 2005-08-05 13:30:51 by Eóin
f0dder,


Well, enough on the topic of religion - not allowed by board rules etc. Is his code good?


Relax, just mentioning that someone has a religious predisposition is not a religious topic. ?You did not discuss doctrine, or advocate any kind of worship of a deity.

His code works. ?Does that mean it is good? ?I analyzed his code a while back. ?Not documented very good. ?As mentioned earlier, he uses a look up table and does a linear interpolation to calculate in between values. ?He does something clever in that he uses the same table for the sine and cosine. ?The entry points for the table are 90? out of phase with each other, as is the sine and cosine. ?His method also incorporates a parallel table of difference values for each sine/cosine entry. ?He uses that difference table for interpolation. ?That is a lot of memory, since each entry for the sine/cosine and difference table is a quad word. ? I sent him a e-mail a while back suggesting that he use the incremental Taylor series instead of the linear interpolation method. ?He could get rid of the difference table and also achieve more accuracy. ?I never got an answer.

E?in,


When I was looking into it I was a bit paranoid about precision,.....


? ? Well, you know what the precision of a Real4 or Real8 word is. ?I think you really mean accuracy. ?Let's analyze his method. ?Since he uses linear interpolation, the worse accuracy will be where the function direction changes the fastest. ?On a sine curve, that is at 90? and 270?. ?So what is his accuracy close to 90?, say at bittian 256.5? ?The FPU FSIN value is 0.999995293809576. ?His bittian method gives 0.999990587641301. ?We can expect his values to be low at that location because the sine curves above the two points. ?At 270?, his values will be high. ?My 3 term Taylor series method gives 0.999995293805885. ?The differences between the "actual" value of his and my methods are ?-4.706168275636635E-6 and -3.691380534576183E-12 respectively. ?I hope this gives you a feeling for the accuracy. ?I am enclosing some code for the 3 term Taylor series method. ?This idea should really be timed to see if it is faster than the FPU. ?Ratch


;*******************************************************************************
; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?;
; Sin of bitians using bitian method ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ;
; Assumes bitian is already in st(0) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ;
; Uses three Taylor series terms ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ;
; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?;
;*******************************************************************************
BSIN3 MACRO
? FIST ? ? ? ;save the integer part of bitian in temporary storage
? FISUB ? ? ?;TOS contains fractional part of bitian
? MOV EAX,
? FMUL B2R ? ? ? ? ? ;convert from bitians to radians for 2nd Taylor series term
? SHL EAX,22 ? ? ? ? ;clear all but lower 10 bits
? FLD 0 ? ? ? ? ? ? ?;**duplicate TOS
? SHR EAX,19 ? ? ? ? ;restore EAX and multiply by 8 for index into table
? FMUL ;compute 2nd Taylor series term by multiplying by cos
? FADD ;add 1st term of Taylor series
? FXCH ? ? ? ? ? ? ? ;**prepare to compute 3rd term
? FMUL 0 ? ? ? ? ? ? ;**square TOS
? FMUL ;**multipy by sin
? FMUL ? ? ? ;**divide by 2 to complete 3rd term of Taylor series
? FSUB ? ? ? ? ? ? ? ;**complete sin by subtracting 3rd term of Taylor series
ENDM
;*******************************************************************************


Posted on 2005-08-12 01:30:04 by Ratch