costs. Especially when doing matrix multiplication with sparse matrices.
starting with:
;...
mul ecx
;...
my idea was:
;...
jecxz no_mul
xchg eax,ecx
jecxz no_mul
mul ecx
;...
will that speed up the code? Or is the every time checking slowing it down so much that takes even longer?
Is there a better alternative to prevent a multiplication with zero?
Ideas, suggestions? Thanks.
starting with:
;...
mul ecx
;...
my idea was:
;...
jecxz no_mul
xchg eax,ecx
jecxz no_mul
mul ecx
;...
will that speed up the code? Or is the every time checking slowing it down so much that takes even longer?
Is there a better alternative to prevent a multiplication with zero?
Ideas, suggestions? Thanks.
conditional jumps are very slow, you don't want them inside a loop.
On the other hand, if you can detect that a matrix is sparse "cheaply" (perhaps because you _know_ it's always going to be sparse), it can pay off to use specialized routines... but not generic routines with a lot of conditionals.
You might want to check out http://en.wikipedia.org/wiki/BLAS
On the other hand, if you can detect that a matrix is sparse "cheaply" (perhaps because you _know_ it's always going to be sparse), it can pay off to use specialized routines... but not generic routines with a lot of conditionals.
You might want to check out http://en.wikipedia.org/wiki/BLAS
Man I dont know about that but if you're gonna save much by this then I firstly assume that your matrices are pretty large - and/or that you are using them intensively.
If thats the case, then I suggest that you key your matrices.
Create a little bitmap where each bit represents an empty / used cell in your matrix.
After all, a matrix is just an array.
Now write your math routines around the concept of walking that bitkey, and now you can use your test to skip up to 32 cells at a time :)
What we've done is implement a simple 'culling' scheme, which is typical in modern game development (and in medical imaging), where the datasets can be dauntingly large and yet quite sparse.
For large arrays, tree-based culling ('partitioning') schemes are most efficient.
We wish to describe which parts of the array are 'active', and preferably, find them as quickly (with as few operations) as possible. For trees, that means a sorted insertion as we build our array (matrix), resulting in a tree that ONLY contains active entries.... and in the case of a bi-tree, can be described via a flat array and a bitkey :P
If thats the case, then I suggest that you key your matrices.
Create a little bitmap where each bit represents an empty / used cell in your matrix.
After all, a matrix is just an array.
Now write your math routines around the concept of walking that bitkey, and now you can use your test to skip up to 32 cells at a time :)
What we've done is implement a simple 'culling' scheme, which is typical in modern game development (and in medical imaging), where the datasets can be dauntingly large and yet quite sparse.
For large arrays, tree-based culling ('partitioning') schemes are most efficient.
We wish to describe which parts of the array are 'active', and preferably, find them as quickly (with as few operations) as possible. For trees, that means a sorted insertion as we build our array (matrix), resulting in a tree that ONLY contains active entries.... and in the case of a bi-tree, can be described via a flat array and a bitkey :P
To put things simply, atcl, you won't save much by simply conditional-skipping any multiplications. And this is even more true with SEE where you can multiply a whole row by column in 2-3 instructions. It could work nicer if it was divisions. But still, there are more clever ways to do things if you really need to.
thanks for the answer f0dder.
hmph :P
Can't you insert a divide overflow handler? There should be a way in GetLast Error
You can set up exception handling for div-by-zero, but that is expensive.
Do take timon's and puum...homer's advice into consideration as well, unless you _know_ that you will have a very sparse matrix, you're probably better just brute-forcing through it with SSE :)
Do take timon's and puum...homer's advice into consideration as well, unless you _know_ that you will have a very sparse matrix, you're probably better just brute-forcing through it with SSE :)
I dont think branch prediction is an issue here....
...the issue is that a mul is no slower than a comparison. Infact, its faster on most desktops.
...the issue is that a mul is no slower than a comparison. Infact, its faster on most desktops.