costs. Especially when doing matrix multiplication with sparse matrices.

starting with:

;...
mul ecx
;...

my idea was:

;...
jecxz no_mul
xchg eax,ecx
jecxz no_mul
mul ecx
;...

will that speed up the code? Or is the every time checking slowing it down so much that takes even longer?
Is there a better alternative to prevent a multiplication with zero?

Ideas, suggestions? Thanks.
Posted on 2007-12-29 03:55:03 by atcl
conditional jumps are very slow, you don't want them inside a loop.

On the other hand, if you can detect that a matrix is sparse "cheaply" (perhaps because you _know_ it's always going to be sparse), it can pay off to use specialized routines... but not generic routines with a lot of conditionals.

You might want to check out http://en.wikipedia.org/wiki/BLAS
Posted on 2007-12-29 04:42:34 by f0dder
Man I dont know about that but if you're gonna save much by this then I firstly assume that your matrices are pretty large - and/or that you are using them intensively.
If thats the case, then I suggest that you key your matrices.
Create a little bitmap where each bit represents an empty / used cell in your matrix.
After all, a matrix is just an array.
Now write your math routines around the concept of walking that bitkey, and now you can use your test to skip up to 32 cells at a time :)

What we've done is implement a simple 'culling' scheme, which is typical in modern game development (and in medical imaging), where the datasets can be dauntingly large and yet quite sparse.
For large arrays, tree-based culling ('partitioning') schemes are most efficient.
We wish to describe which parts of the array are 'active', and preferably, find them as quickly (with as few operations) as possible. For trees, that means a sorted insertion as we build our array (matrix), resulting in a tree that ONLY contains active entries.... and in the case of a bi-tree, can be described via a flat array and a bitkey :P
Posted on 2007-12-29 09:04:29 by Homer
To put things simply, atcl, you won't save much by simply conditional-skipping any multiplications. And this is even more true with SEE where you can multiply a whole row by column in 2-3 instructions. It could work nicer if it was divisions. But still, there are more clever ways to do things if you really need to.
Posted on 2007-12-29 16:12:03 by ti_mo_n
thanks for the answer f0dder.
Posted on 2007-12-30 05:54:27 by atcl
hmph :P
Posted on 2007-12-30 06:52:13 by Homer
Can't you insert a divide overflow handler? There should be a way in GetLast Error
Posted on 2007-12-30 13:01:22 by mrgone
You can set up exception handling for div-by-zero, but that is expensive.

Do take timon's and puum...homer's advice into consideration as well, unless you _know_ that you will have a very sparse matrix, you're probably better just brute-forcing through it with SSE :)
Posted on 2007-12-30 13:10:25 by f0dder
I dont think branch prediction is an issue here....

...the issue is that a mul is no slower than a comparison. Infact, its faster on most desktops.
Posted on 2008-01-05 03:14:55 by Rockoon