What's the proper way to unroll this loop?



lea esi,szstring
lea edi,outbuffer
xor ecx,ecx
movzx eax,byte ptr [esi+ecx]
.while eax != 0
mov al,byte ptr [cipertable+eax] mov byte ptr [edi+ecx],al
inc ecx
movzx eax,byte ptr [esi+ecx]
.endw
mov byte ptr [edi+ecx],0


I'm having trouble testing for the end condition when I unroll it. That is, all the speed gains I get from unrolling it 2 or 3 times are offset by multiple tests for the end of the input string. So what's the correct way to unroll something like this?
Posted on 2002-04-21 05:23:47 by grv575
grv575,

I would be inclined to remove the .WHILE/.ENDW and code that part manually first before you attempt to unroll the loop as it may not be the most efficient way to do the comparisons.

If you can get that into a minimum instruction count, there may be room to unroll it then.

Regards,

hutch@movsd.com
Posted on 2002-04-21 07:05:22 by hutch--
	lea esi,szstring

lea edi,outbuffer
xor ecx,ecx
jmp _3

_2: inc ecx
mov al,byte ptr [cipertable+eax]
mov byte ptr [edi+ecx-1],al
_3: movzx eax,byte ptr [esi+ecx]
or byte ptr [esi+ecx],0
jne _2

_x: mov byte ptr [edi+ecx],0
To unroll just duplicate this section before the _3 label:
	movzx eax,byte ptr [esi+ecx]

or byte ptr [esi+ecx],0
je _x
inc ecx
mov al,byte ptr [cipertable+eax]
mov byte ptr [edi+ecx-1],al
Note: I've also removed some forward dependancies. It would be faster to find the length of the string and cipher in parallel, rather than serial. This might not work at all. :tongue:
Posted on 2002-04-21 10:03:15 by bitRAKE
hutch: Yeah i moved to jmp/cmp instructions and that saved me 2 cycles.

bitrake: Thanks for the help. The routine is a bit faster now after unrolling two times. Was unsure how to do this for while loops.

Also I replaced (or byte ptr ,0) with (test eax,eax). Seems to run faster on a PIII.
There's no partial register penalty for al/eax movs on a PIII is there? Moving the inc ecx to the top of the loop eliminates a read after write depenency with eax, correct? Kinda new to this optimization stuff.

Thanks again,
grv
Posted on 2002-04-21 16:18:23 by grv575