Hi.

I'm trying to find the maximum of the absolute value of an array of numbers.

Assuming: eax = new sample (next number in array)

ecx = current maximum

edx = scrap register

I currently use:

cdq

add eax, edx

xor eax, edx

cmp eax, ecx

jle (next sample)

mov ecx, eax

I'd like to do this without branching, though. I came up with:

cdq

add eax, edx

xor edx, eax

mov eax, ecx

sub eax, edx

cdq

and eax, edx

sub ecx, eax

Can anyone come up with something faster? (I only have the one available scrap register and it must run on all processors, so no conditional moves, etc.)

Thanks.

I'm trying to find the maximum of the absolute value of an array of numbers.

Assuming: eax = new sample (next number in array)

ecx = current maximum

edx = scrap register

I currently use:

cdq

add eax, edx

xor eax, edx

cmp eax, ecx

jle (next sample)

mov ecx, eax

I'd like to do this without branching, though. I came up with:

cdq

add eax, edx

xor edx, eax

mov eax, ecx

sub eax, edx

cdq

and eax, edx

sub ecx, eax

Can anyone come up with something faster? (I only have the one available scrap register and it must run on all processors, so no conditional moves, etc.)

Thanks.

This should work for you. Assume eax is the current value. Also assume ecx contains the running maximum value. There is no compare to be done, just loop through it. Dunno if you can get much faster, the trouble is all the forward dependencies. I'll think about it some more, but until then...

--Chorus

```
```

cdq ;also mov edx,eax/sar edx,31

add eax,edx

xor eax,edx ;eax should now equal abs(eax)

sub ecx,eax ;assume we hold the current max in ecx

sbb edx,edx

not edx

and ecx,edx

add ecx,eax ;ecx contains max (old max,eax)

--Chorus

You can also use this code for absolute values

```
mov eax, -10
```

mov edx, eax

sar edx, 31

xor eax, edx

sub eax, edx

Just my 2 cents. :)Stryker, do you happen to know if the mov edx,eax/sar edx,31 is any faster than the cdq? I mentioned it briefly above that you could use the method you posted, but I'm not sure which is actually quicker.

--Chorus

--Chorus

I don't know but I've heard cdq is slow on older processors. sar register32, 31 is faster. But since I don't have older cpu's, I can't tell which is faster(based on tests...) :)

It can be faster on newer processors, too. But you have be able to space the instructions out to remove dependancies. And you have the side benefit of being able to use other registers besides EDX.

Thanks!