This code subtracts a fixed number (32*4) from esi when ecx is a multiple of 32. Any ideas on how to optimize this? I'm looking for a way to remove the jump.

EDIT: And I have only eax available as a temp register....

```
```

test ecx,32-1

jnz @F

sub esi,32*4

@@:

```
lea eax, [ecx + 32 - 1]
```

; (space for instruction)

xor eax, ecx

; (space for instruction)

and eax, 32

; (space for instruction)

lea esi, [esi + eax*4][-128]

EAX is zero only in the case that ECX is a multiple of 32. ;)Ok, I've made the changes. Now my inner loop looks like this:

Is this optimal?

Great piece of work btw, I couldn't have figured that out myself... wow.

```
```

CellLoop:

mov eax,[esi]

inc ecx

mov [edi],eax

lea eax,[ecx+32-1]

add esi,4

xor eax,ecx

add edi,4

and eax,32

cmp ecx,[ebx]

lea esi,[esi+eax*4][-128]

jne CellLoop

bitRake, Qweerdy,

Treat me as a newbie on this one.

bitRake how does your algo work? and why do u leave spaces for instructions in between?

Also how does

"test ecx,32-1" check ecx for being a multiple of 32.

Whatever i am asking may sound completly stupid but i am pretty new to asm.

```
```

1100000 = 32 * 3 in binary

100000 = 32 in binary

11111 = 32-1 in binary

so if the lower 5 bits of a number are clear, the number is a multiple of 32.

**gladiator**, it is binary math. It works by finding if the bit-5 has changed by adding 31. If bit-5 changes then ECX is not a multiple of 32 (ex. 32/32 = 1; 32+31 = 63; 63/32 = 1; but: 33/32 = 1; 33+31=64; 64/32 = 2). Spaces are left for older processors that can't execute instructions out of order - trying to eliminate dependancies between instructions. Agner Fog's optimization guide would be a good read for you.

**Qweerdy**, should not be accessed in the loop - the number of loops needed can be calculated. I didn't assume ECX is zero on entry to your snippet. This could be trimmed up a little if left side is always aligned.

```
lea eax, [ecx+32-1]
```

and ecx, 32-1

and eax, -32

lea esi, [esi + ecx*4]

sub eax, [ebx]

neg eax

CellLoop:

rep movsd

mov ecx, 32

sub esi, 128

sub eax, 32

jnc CellLoop

; do right unaligned dwords

add ecx, eax

rep movsd

I'm sorry, but I couldn't get your snippet to work :(

Since you've apparently already downloaded the complete source from my website, could you please post the whole proc?

thanks bitRake and Qweerdy.

I understand now.

P.S.- Where can i come to know these tricks about binary math?

http://www.math.grin.edu/~rebelsky/Courses/152/97F/Readings/student-binary.html

http://www.learnbinary.com/Binary2Dec.html

I learnt from my first ASM book and I use binary operations all the time. There really aren't any tricks - it looks that way sometimes, but it is just experience.