Hi.
I'm trying to find the maximum of the absolute value of an array of numbers.
Assuming: eax = new sample (next number in array)
ecx = current maximum
edx = scrap register
I currently use:
cdq
add eax, edx
xor eax, edx
cmp eax, ecx
jle (next sample)
mov ecx, eax
I'd like to do this without branching, though. I came up with:
cdq
add eax, edx
xor edx, eax
mov eax, ecx
sub eax, edx
cdq
and eax, edx
sub ecx, eax
Can anyone come up with something faster? (I only have the one available scrap register and it must run on all processors, so no conditional moves, etc.)
Thanks.
I'm trying to find the maximum of the absolute value of an array of numbers.
Assuming: eax = new sample (next number in array)
ecx = current maximum
edx = scrap register
I currently use:
cdq
add eax, edx
xor eax, edx
cmp eax, ecx
jle (next sample)
mov ecx, eax
I'd like to do this without branching, though. I came up with:
cdq
add eax, edx
xor edx, eax
mov eax, ecx
sub eax, edx
cdq
and eax, edx
sub ecx, eax
Can anyone come up with something faster? (I only have the one available scrap register and it must run on all processors, so no conditional moves, etc.)
Thanks.
This should work for you. Assume eax is the current value. Also assume ecx contains the running maximum value. There is no compare to be done, just loop through it. Dunno if you can get much faster, the trouble is all the forward dependencies. I'll think about it some more, but until then...
--Chorus
cdq ;also mov edx,eax/sar edx,31
add eax,edx
xor eax,edx ;eax should now equal abs(eax)
sub ecx,eax ;assume we hold the current max in ecx
sbb edx,edx
not edx
and ecx,edx
add ecx,eax ;ecx contains max (old max,eax)
--Chorus
You can also use this code for absolute values
mov eax, -10
mov edx, eax
sar edx, 31
xor eax, edx
sub eax, edx
Just my 2 cents. :)Stryker, do you happen to know if the mov edx,eax/sar edx,31 is any faster than the cdq? I mentioned it briefly above that you could use the method you posted, but I'm not sure which is actually quicker.
--Chorus
--Chorus
I don't know but I've heard cdq is slow on older processors. sar register32, 31 is faster. But since I don't have older cpu's, I can't tell which is faster(based on tests...) :)
It can be faster on newer processors, too. But you have be able to space the instructions out to remove dependancies. And you have the side benefit of being able to use other registers besides EDX.
Thanks!