hi


Optimizing code help!
usenet MIME decode
Posted on 2002-11-29 06:25:51 by playh
change
      cmp eax, 0

to
test eax,eax

or
or eax,eax


I will look through your code and give you a more updated comment later.
Posted on 2002-11-29 07:56:13 by roticv
decode proc input:DWORD,output:DWORD,count:DWORD

LOCAL count2:DWORD

xor ebx,ebx
xor ecx,ecx
mov esi,input
lea edi,decodetable
loop@@@:
; mov count2,0
and count2,0
xor eax,eax
loop@@:

; .if count2 == 4
; jmp end1
; .endif
cmp count2,4
je end1
; .if ecx == count
; jmp end2
; .endif
cmp [count],ecx
je end2
movzx edx,byte ptr [esi+ecx]
; lea edi,decodetable
movzx edx,byte ptr [edi+edx]
shl eax,6h
or eax,edx
inc ecx
inc count2
jmp loop@@


end1:
mov edx,output
mov [edx+ebx+2],al
shr eax,8
mov [edx+ebx+1],al
shr eax,8
mov [edx+ebx],al
add ebx,3

; .if count > ebx
; jmp loop@@@
; .endif
cmp [count],ebx
ja loop@@@

end2:
; mov eax,ebx
xchg eax,ebx ;not preserving ebx and since xchg eax,xxx is 1 byte only
ret
decode endp
Posted on 2002-11-29 08:37:08 by roticv
thx roticv

u good coder!

:alright:
Posted on 2002-11-29 09:17:59 by playh
You'll hit 3 partial register stalls in the "end1" section of the code. It'll cripple loop performance on the Pentium II and above processors.

You can remove 1 by using ah on the middle move (it'll save the shift on other processors too).

If you know that the output is big enough to hold 1 extra byte, then you can:


end1:
mov edx,output
shl eax, 8
bswap eax
mov [edx + ebx], eax
; mov [edx+ebx+2],al
; shr eax,8
; mov [edx+ebx+1],al
; shr eax,8
; mov [edx+ebx],al
add ebx,3


I don't know if that'll be possible though.

Also roticv "mov count2, 0" will probably be faster... "and mem, immed" will need a read-modify-write operation on the memory, while a "mov mem, immed" is a simple write operation.

Mirno
Posted on 2002-11-29 09:42:09 by Mirno