I was wondering if anyone could perscribe a faster method of number crunhing, as outlined below:

Im thinking mmx would probably be best, but im too novice at it to "design" it right (as im learing) :rolleyes:


[b].data[/b]
Base dd (1 SHL 15)

Pout1 dq 0
Pin1 dq 0
Pout2 dq 0
Pin2 dq 0

; 200hz ; 100hz 10 hz 1000 hz Corner selection values
A dq 0.02769840546830 ;0.01404650652530 ;0.00142273056281 ;0.12397764071599
B dq 0.94460318906340 ;0.97190698694940 ;0.99715453887439 ;0.75204471856802

[b].code[/b]
LPFilter PROC uses ebx esi edi lpHdr:DWORD
mov ebx, lpHdr
mov esi, [ebx].WAVEHDR.lpData ; Get the start address of data
mov ecx, [ebx].WAVEHDR.dwBufferLength ; Get # of bytes to process
shr ecx, 2 ; div by 2 chanels and 2 byts/chanel

.while ( ecx > 0)
fild SWORD PTR [esi] ; Left Channel Data 16 bits
fild DWORD PTR Base ; Normalize it
fdiv
fld st(0) ; Copy for Pin (prev input) update l8r on
fmul QWORD PTR A ; 1) In * A
fld QWORD PTR Pin1 ; 2) PrevIn * A
fmul QWORD PTR A ;
fld QWORD PTR Pout1 ; 3) PrevOut * B
fmul QWORD PTR B ;
fadd
fadd ; St0 = In*A + PrevIn*A + PrevOut*B
fxch ;
fstp QWORD PTR Pin1 ; Update the Prev In, pop
fst QWORD PTR Pout1 ; Update PrevOut with output
fild DWORD PTR Base ; Load base again
fmul ; Denormalize
fistp SWORD PTR [esi] ; pop 16 bit sword == Output

add esi,2 ; Goto Right Channel
fild SWORD PTR [esi]
fild DWORD PTR Base
fdiv
fld st(0)
fmul QWORD PTR A
fld QWORD PTR Pin2 ; RIght PRev In
fmul QWORD PTR A
fld QWORD PTR Pout2 ; Right Prev Out
fmul QWORD PTR B
fadd
fadd
fxch
fstp QWORD PTR Pin2
fst QWORD PTR Pout2
fild DWORD PTR Base
fmul
fistp SWORD PTR [esi]
add esi,2
dec ecx
.endw
@@:
ret
LPFilter ENDP


ESI has the CURRENT WORD from the buffer (In), and the data members Pin1, Pin2, Pout1, Pout2 are previous Input/Ouput values from previous "crunches" of this algorithm:

Out = A*In + A*Pin + B*Pout
Pin= In
Pout = Out

Where it is copied in parallel for left and right (Pin1, Pin2) and (Pout1, Pout2). Out is written back where the input was found, but Previous values are still noted separately. This is because the buffer is not continous, and will be broken!


For backgournd:
--------------------
The entire algorithm above is a Low Pass filter with a corner set to 200hz. It will filter properly any 16bit, dual channel wave file, and play it. Its currently hard coded to "beatles.wav" as im testing in on a ripped mp3->wav.

I plan to write *alot* of simular code, which will get more complex, so im hoping someone can help be see the most optomized solutions, on the basic building block shown above. Then i can learn and "multiply" the basic knowledge onto the bigger alorithms.

Thanx alot in advanced. All i can give in return is my source for you play with ;)
:alright:
NaN
Posted on 2002-05-21 04:04:21 by NaN
http://www.musicdsp.org
- nice searchable collection of articles and code

http://www.digitalfishphones.com/main.php?item=3&subItem=2
- link collection

hope it helps somehow
TBD
Posted on 2002-05-21 06:14:30 by TBD
NaN, here is the basics outline:

Two channels are done in parallel
    [*]load qword (In1,In2,In1,In2)
    [*]unpack 2x (In1,0,In2,0)
    [*]add previous 2x (Pin1,Pout1,Pin2,Pout2)
    [*]multiply add 2x (A,B,A,B) constant
    [*]pack 2x (In1',In2',In1',In2')
    [*]update previous 2x (shift/or)?
    [*]store qwordThe equation can be reduced to: Out = A*(In + Pin) + B*Pout

    The middle part of the outline is doubled over to ease reading, but you must wait for the results of the first part of the middle to update the previous for the second part of the middle - am I confusing you yet? I'd code it up, but I'm at work and also still in the process of moving.
Posted on 2002-05-21 10:18:16 by bitRAKE
	mov ecx,samples

mov esi,source
mov edi,dest
shr ecx,1 ; two samples per loop
lea esi,[esi+ecx*8]
lea edi,[edi+ecx*8]
neg ecx

xor mm7,mm7 ; zero
movq mm6,previous ; zero?
movq mm5,ABAB

_0: mov mm0,[esi + ecx*8]
mov mm1,mm0

punpckldw mm0,mm7
punpckhdw mm1,mm7

;sample one
addsw mm0,mm6
pmaddwd mm0,mm5
packsdw mm0,mm7

pslrd mm6,16
por mm6,mm0

;sample two
addsw mm1,mm6
pmaddwd mm1,mm5
packsdw mm1,mm7

pslrd mm6,16
por mm6,mm1


packswd mm0,mm1
movq [edi+ecx*8],mm0

inc ecx
jne _0
...maybe something like this? Atleast you should get the gist of what I mean? Forgive me I'm programming without a computer again, but I'll debug this tomorrow when I have power at my new home.
Posted on 2002-05-21 13:27:29 by bitRAKE
Thanks for your help.. I will try to carry forth from here ;) Much appreciated!

I have times from FPU example, im currious to see the saving if any here...

TBD: Thanx for the links! They look great!

Thanx again
:alright:
NaN
Posted on 2002-05-21 14:23:46 by NaN