Hello,

i tried to write a StrReverse proc. What do you think about?



strrev proc uses ESI EDI lpString: LPSTR

MOV ESI, lpString
invoke lstrlen, ESI
MOV ECX, EAX
LEA EDI, [ESI+ECX-1] ; last byte of lpString
PUSH ECX ; strlen

SHR ECX, 3 ; strlen / 8 = 2 DWORD blocks
TEST ECX, ECX
JE @@next
LEA EDI, [EDI-3] ; ptr to last DWORD block

; copy 4 Bytes from ESI, 4 Bytes from EDI, swap both reg, ESI <-> EDI
@@loop4:
MOV EAX, [ESI]
MOV EDX, [EDI]
BSWAP EAX
BSWAP EDX
MOV [ESI], EDX
MOV [EDI], EAX
LEA ESI, [ESI+4]
LEA EDI, [EDI-4]
DEC ECX
JNZ @@loop4

@@next:
POP ECX ; POP strlen
AND ECX, 7
SHR ECX, 1
TEST ECX, ECX
JE @@exit

@@loop1:
MOV AL, [ESI]
MOV AH, [EDI]
MOV [EDI], AL
MOV [ESI], AH
INC ESI
DEC EDI
DEC ECX
JNZ @@loop1

@@exit:
ret
strrev endp


And a very small version:



strrev2 proc uses ESI EDI lpString: LPSTR
MOV ESI, lpString
invoke lstrlen, ESI
LEA EDI, [ESI+EAX-1]

@@loop:
CMP ESI, EDI
JL @@exit
MOV AL, [ESI]
MOV AH, [EDI]
MOV [ESI], AH
MOV [EDI], AL
INC ESI
DEC EDI
JMP @@loop

@@exit:
ret
strrev2 endp



Have a nice day, Manu.
Posted on 2004-02-24 07:21:33 by other
Hi other,

might be faster to do it in DWORDs, the main part would be :

mov edi,[lpDest]

mov esi,[lpSource]
mov ecx,[nBytes]
add esi,ecx
sub esi,4

shr ecx,2
:
mov eax,[esi]
bswap eax
mov [edi],eax
sub esi,4
add edi,4
dec ecx
jnz <


Where ECX is the string length, ESI is the source and EDI is the destination. You will have to figure out how to handle the remainders, I have never needed the function so I never put much thought into it.
Posted on 2004-02-24 08:00:56 by donkey
Hello,


might be faster to do it in DWORDs, the main part would be :


Look ahead :-). For the fast algo, i use DWORD's and copy 2 DWORD's at 'once' :).





add esi,ecx
sub esi,4


Lea ESI, could be better?



shr ecx,2


SHR ECX, 3 and copy 2 Blocks (head to foot and vice versa) ...


Have a nice day, Manuel.
Posted on 2004-02-24 08:09:38 by other
Hi other,

I was commenting on the second (short version), I didn't really look at the first (it was a bit long for something I am not likely to use). I agree that lea would be better in the example, I didn't actually think too much about it, just sort of typed it.
Posted on 2004-02-24 08:22:55 by donkey
Hello,


I was commenting on the second (short version), I didn't really look at the first (it was a bit long for something I am not likely to use). I agree that lea would be better in the example, I didn't actually think too much about it, just sort of typed it.


Sorry. I misunderstood you. :-)


Have a nice day, Manuel.
Posted on 2004-02-24 10:44:23 by other
other,

"And a very small version:.."

here is smaller one
just 25 bytes...



OPTION PROLOGUE:NONE ; turn it off
OPTION EPILOGUE:NONE ;
StrRev proc lpString:DWORD ;
;
pop edx ; edx->return address
pop eax ; eax->lpString
push edx ; edx->return address
push esi ; save esi
cld ; clears the Direction Flag
xor esi, esi ; esi = 0
push edi ; save edi
mov edi, eax ; edi->lpString
xchg eax, esi ; esi->lpString; eax = 0
L_1: ; saving the string in the stack
push eax ;
lodsb ; mov al, [esi] -> inc esi
test eax, eax ; is it end of the string?
jne L_1 ;
L_2: ; restoring the string from the stack
pop eax ;
stosb ; mov [edi], al -> inc edi
dec eax ; is it end of the string from the stack?
jns L_2 ;
pop edi ; restore esi and edi
pop esi ;
ret ; 25 bytes
StrRev endp ;
OPTION PROLOGUE:PROLOGUEDEF ; turn back on the defaults
OPTION EPILOGUE:EPILOGUEDEF ;


Regards,
Lingo
Posted on 2004-02-24 12:08:53 by lingo12
Hello,


other,

"And a very small version:.."

here is smaller one
just 25 bytes...


xchg eax, esi ; esi->lpString; eax = 0
L_1: ; saving the string in the stack
push eax ;
lodsb ; mov al, [esi] -> inc esi
test eax, eax ; is it end of the string?
jne L_1 ;
L_2: ; restoring the string from the stack
pop eax ;
stosb ; mov [edi], al -> inc edi
dec eax ; is it end of the string from the stack?
jns L_2 ;





:cool:

Cool. Nice idea to use the stack :-). Do you think, there is a bottleneck in the first version?


Regards Manuel.
Posted on 2004-02-24 13:20:44 by other
"Do you think, there is a bottleneck in the first version?"

- "standard" lstrlen is slow and we can skip it here

- LEA EDI, -> lea is slow instruction in P4

- ;copy 4 Bytes from ESI, 4 Bytes from EDI, swap both reg, ESI <-> EDI
slow because EDI and ESI are not DD alligned;
additional clocks for BSWAP;
wil be faster to read/write single bytes

- DEC ECX -> dec/inc is slow instruction in P4

- SHR ECX, 3 -> shr is slow instruction in P4


Regards,
Lingo
Posted on 2004-02-24 13:50:26 by lingo12