I am trying to optimize RudeBoy's Md5 implementation.I searched and read several articles and found something to optimize rounds of this functions.
In Md5 description they are described as F,G,H,I
only F and G can be optimized according to this papers also instead of adding two registers and constant lea will be used.So I tried to change this functions.

Md5 defines this functions
#define F(x,y,z) (((x) & (y)) | ((~(x)) & (z)))
#define G(x,y,z) (((x) & (z)) | ((y) & (~(z))))

so in RudeBoy's source.
if
; eax = X
; ebx = Y
; edx = Z
_MD5Trans1 MACRO

;F------------------------
and ebx, eax ; Y & X
not eax ; ~X
and eax, edx ; X & Z
or eax, ebx ; X | Y
;---------------------------
ENDM
_MD5Trans2 MACRO
;G--------------------------
and eax, edx ; X & Z
not edx ; ~Z
and ebx, edx ; Y & Z
or eax, ebx ; X | Y
;---------------------------
ENDM

so it can be translated to
#define F(x,y,z) ((((y) ^ (z)) & (x)) ^ (z))
#define G(x,y,z) ((((x) ^ (y)) & (z)) ^ (y))

Which is also pointed on SSLeay package ftp://ftp.psy.uq.oz.au/pub/Crypto/SSL/ and below articles
http://citeseer.nj.nec.com/bosselaers96fast.html
http://citeseer.nj.nec.com/bosselaers97even.html

so in assembly


; eax = X
; ebx = Y
; edx = Z
_MD5Trans1 MACRO
;F------------------------
xor ebx,edx ; Y ^ Z
and ebx,eax ; Y & X
xor ebx,edx ; Y ^ Z
;-------------------------
ENDM

_MD5Trans2 MACRO
;G------------------------
xor eax,ebx ; X ^ Y
and eax,edx ; X & Z
xor eax,ebx ; X ^ Y
;-------------------------
ENDM

Everything seems to correct however.G function works whereas, F function doesnt work.It gives bad result.It only works if I change to


xor ebx,edx ; Y ^ Z
and eax,ebx ; X & Y
xor eax,edx ; X ^ Z


But this does not fit the definition of algo.Also in SSLeay package there is a optimized asm source of MD5(it is ugly formatted and I cant managed to make it work.Which is like this


;R0 first round
xor edi, edx ; Y ^ Z
and edi, ebx ; Y & X ;this and lea can be exchanged.
lea eax, DWORD PTR 3614090360[ebp*1+eax]
mov ebp, DWORD PTR 4[esi]
xor edi, edx ; Y ^ Z
add eax, edi
mov edi, ebx
rol eax, 7
add eax, ebx
;first round ends

it can be changed to
								     Cycles      

xor edi, edx ; Y ^ Z paired
lea eax, DWORD PTR 3614090360[ebp*1+eax] 1
and edi, ebx ; Y & X paired
mov ebp, DWORD PTR 4[esi] 1
xor edi, edx ; Y ^ Z paired
add eax, edi 1
mov edi, ebx paired
rol eax, 7 1
add eax, ebx 1


As you see it xors same registers.I have tried to call this proc like this


invoke MDxInit,addr msum
invoke MDxPad,addr myData, 11h, 11h
push 11h
push offset myData
push offset msum
call _md5_block_x86 ; it is in SSLeay package.

It seems to work.However, if I hash continuesly after 86351 operation it gives stack error.I guess it is related to buffer owerflow.RudeBoy's source can be downloaded from
http://thor.prohosting.com/~win32asm
Thanks for any response
Posted on 2001-09-05 16:51:07 by LaptoniC
Whats the function _md5_block_x86's calling type?
If it is a C function then you need to clean up the stack after the call and this could be why you end up with a stack overflow.

86351 * 4 * 3 = almost 1 MB of data (which I think is the default allocation of stack space).
That seems like too much of a coincidence to me!

As for your other problem, in what way is it a bad result?
Where are you expecting your result?
In the 'G' macro the result is held in eax, in 'F' it should be in ebx.
I'm sorry if you know this already, but sometimes we miss the obvious :D

Mirno
Posted on 2001-09-06 05:17:52 by Mirno
How can I free the stack ?.I dont know much about stack.Original asm source is like this.
TITLE md5-586.asm
.386
.model FLAT
_TEXT SEGMENT
PUBLIC _md5_block_x86

_md5_block_x86 PROC NEAR
..
..
_md5_block_x86 ENDP
_TEXT ENDS
END

In the Ssleay source code and articles I saw that they are xoring same registers so this puzzled me.
Thanks
Posted on 2001-09-06 07:08:18 by LaptoniC
To check whether it is the stack that is the problem, you need to check the value of esp before and after the call (ie before all the pushes, and after the call has executed).
If the stack is the problem, then you'll find that esp is smaller by 12 after the call. This is because you are pushing 3 dwords (12 bytes).
Simply solve this by adding 12 to esp as this is what MASM does for you when the calling convention is C (see the MASM generated code listing below for an example)!


00000000 .code
00000000 start:
invoke myPROC, 1, 2
00000000 1 6A 02 * push +000000002h
00000002 1 6A 01 * push +000000001h
00000004 1 E8 0000000A * call myPROC
00000009 1 83 C4 08 * add esp, 000000008h

invoke ExitProcess, 0
0000000C 1 6A 00 * push +000000000h
0000000E 1 E8 00000000 E * call ExitProcess

00000013 myPROC PROC C a:DWORD, b:DWORD
00000013 1 55 * push ebp
00000014 1 8B EC * mov ebp, esp
00000016 1 8B 45 08 mov eax, a
00000019 2 03 45 0C add eax, b
ret
0000001C 3 C9 * leave
0000001D 2 C3 * ret 00000h
0000001E myPROC endp

end start


I've also looked at the macro's Rudeboy wrote, they return the value in eax. In your speedy macro, the result will be in ebx, you can simply solve this by re-ordering the variables in the macro, or change your code so it reads the result from ebx instead (both will need a coding change), or simply add:
   mov eax, ebx 

to the end of your macro for F.

Mirno
Posted on 2001-09-06 08:46:22 by Mirno
Thanks it worked when I add add esp, 000000012 after call.I appreciate your help.Now Md5 is faster :):alright:
Posted on 2001-09-06 15:57:10 by LaptoniC