I'm writing a modular dsp-environment in c++ and since I'm almost done with the framework
I'm going to start writing the algorithms to do the actual DSP soon.
I've done a few test objects such as oscillators and filters etc. and now I'm tryin to get them to run faster.
I'm quite into the maths so that not a problem but I'm noticing and have read about a few things
that you guys might help me with:

first I would like to ask if there's an URL or paper or something on general optimization of floating-point(or generic) algorithms ?
Second (and the most recent reason of why I'm askin): I've noticed in some of the test algorithms
I've written that the main crook in the drama is float-to-int conversions! They're really time consuming, at least
in the context I've tried them in and I wonder if there's a faster way to do them ?
the code is somethin like : out = wavetable[(int)(phase*sizeOfTable)];

I'm not the master of asm, in fact I have never even messed with floating-point asm (so I guess that's a start :) ) ! ,
but I thought you guys might be able to help me get started, especially with the nasty float-to-int conversion :)

thanks a bunch :)
Posted on 2003-06-12 20:46:53 by edmund
I would agree with you that converting floats to ints is slow (approx. 30 clocks) if you have a lot to do.

Q1. Do you need to convert them all or are you using some in further computations? If the latter, storing them as floats would be faster (approx. 7 clocks).

Q2. Have you given any thought to doing all your computations with the CPU in fixed point maths? That could be a lot faster than using floats. If your maximum absolute value is less than 32767, a library of fixed point math functions (with the equivalent of 5 decimal places of precision) already exists. You could also use the source code as an example to write your own functions (with somewhat less overhead) if you need to improve speed further, or if you need a larger range and can afford less precision.

Posted on 2003-06-12 22:31:09 by Raymond
hi Raymond,
thanks for you reply and sorry for not answering 'til now..

The thing is it's a loop and a new phase needs to be caclulated each iteration.
This phase will be multiplied by an integer which represents the size of the array.
The result will be used as an index to the array. ( Oh I just noticed I typed that
in my previous post :rolleyes: ,ahh well. I'm inlcuding it anyways:tongue: )
So I guess the answer to this question is no. ( I didn't really understand the question though,sorry :) )

I have never actually messed with fixed-point arithmetics and it might come in handy but
the thing is, in situations concerning audio, that precision is invaluable, and from what I know (???)
using fixed-point arithmetics means less precision. It's especially needed in
all them recursive algorithms that many audio-algorithms use or else the algorithms would become
highly unstable. ( Although I know of many DSP-implementors using fixed-point arithmetics for solving speed issues )

thanks for the tips :)

Posted on 2003-06-17 13:45:44 by edmund
Fixed point math has as much precision as you want to give it. For example a 32bit DWORD could be divided right down the middle to form your fixed point integer.


This would give you a range of (in hex) -8000.FFFF to 7FFF.FFFF in increments of 1/FFFF. (In decimal that's roughly -32768.999995 to 32767.999995 in increments of 1/65536ths.)

If that's not enough precision for you, then figure out exactly how much you need and accomodate by adding more bit places. For example you could use a QWORD instead to sqeeze more precision out (xxxx.yyyyyyyyyyyy) and so on and so on.
Posted on 2003-06-18 01:00:15 by iblis
If you got the processor support, you could use SSE or 3dnow!

As for float->int conversions, they are slow - and in most C++ environments, they are "even slower", because a helper routine is called to do the stuff. If you're using microsoft visual C++, however, there's a compile-time option to suppress _ftol and use fist instead. Can't remember when it was introduced (VC6?), but the option should be /QIfist .

The reason for not doing this automatically, I guess, is that the C/C++ standard has something to say about default rounding modes etc? - you should be sure to set the x87 control flags before & after your intensive code, and you might want to only do /QIfist for specific modules.

For other compilers, you'll have to rtfm :)
Posted on 2003-06-18 01:56:02 by f0dder
thanks for the help guys! :)
I skipped the fixed-point arithmetics for now and
rewrote the algo in asm and it's now, at least, 50% faster!

I'm also going to optimizing it for both SSE and 3DNow! and hence, I've got little question: ;)
how should I approach it ?
Is it best to write 3 functions: 1 with no optimization just plain x86 instructions, 1 with 3DNow instructions
and 1 with SSE, and use function-pointers and load the appropriate function at startup or is there a smarter way (although I'm not sayin that's not a smart way, I'm just wondering how it's usually done ) ?

thanks alot dudes :)

Posted on 2003-06-22 06:17:06 by edmund