Hello,

I have written a dwtohex-Verison and testet it against the version in the masm-lib.

It seems up to 10 times faster on my PIV.


Only on my PIV?




_dwtohex proc USES EDI num: DWORD, buffer: DWORD

MOV EAX, num
MOV EDI, buffer
MOV ECX, 8


@@loop:
ROL EAX, 4
MOV EDX, EAX
AND EDX, 0Fh
ADD EDX, '0'
CMP EDX, '9'
JBE @@SkipChar
ADD EDX, 'A' - '9' - 1

@@SkipChar:
MOV [EDI], EDX
INC EDI
DEC ECX
JNZ @@loop
MOV [EDI], BYTE PTR 0

ret
_dwtohex endp


MfG Manuel.
Posted on 2004-03-27 09:07:28 by other
Hello other
I tested the performance of your procedure against the original dw2hex with the performance monitoring object of ObjAsm32 called TickCounter on my PIII 500 and found a time improvement of ca. 14% using the following code:

    mov pTC, $New(TickCounter, Init, NULL)


mov ebx, RETRIES
OCall pTC::TickCounter.Start
.while ebx != 0
invoke _dwtohex, 0FFFFAAAAh, addr Buffer
dec ebx
.endw
OCall pTC::TickCounter.Stop
OCall pTC::TickCounter.GetTicks
PrintHex eax
PrintString Buffer

OCall pTC::TickCounter.Reset
mov ebx, RETRIES
OCall pTC::TickCounter.Start
.while ebx != 0
invoke dw2hex, 0FFFFAAAAh, addr Buffer
dec ebx
.endw
OCall pTC::TickCounter.Stop
OCall pTC::TickCounter.GetTicks
PrintHex eax
PrintString Buffer

Destroy pTC



If you consider to use the stosb instruction you can achieve ca. 20% of time reduction using the following code:



_dwtohex proc uses edi num: DWORD, buffer: DWORD
mov edx, num
mov edi, buffer
mov ecx, 8

@@loop:
rol edx, 4
mov eax, edx
and eax, 0Fh
add eax, '0'
cmp eax, '9'
jbe @@SkipChar
add eax, 'A' - '9' - 1

@@SkipChar:
stosb
dec ecx
jnz @@loop
xor eax, eax
stosb

ret
_dwtohex endp



Regards,

Biterider
Posted on 2004-03-27 10:50:48 by Biterider
Hello,

Originally posted by Biterider

If you consider to use the stosb instruction you can achieve ca. 20% of time reduction using the following code:



_dwtohex proc uses edi num: DWORD, buffer: DWORD
mov edx, num
mov edi, buffer
mov ecx, 8

@@loop:
rol edx, 4
mov eax, edx
and eax, 0Fh
add eax, '0'
cmp eax, '9'
jbe @@SkipChar
add eax, 'A' - '9' - 1

@@SkipChar:
stosb
dec ecx
jnz @@loop
xor eax, eax
stosb

ret
_dwtohex endp




Nice but on my PIV is stosb very slow. You code was 20% slower then the first one :-).


MfG Manuel.
Posted on 2004-03-27 11:06:15 by other
That's a major problem with optimization. You may get better results with a given algo on some processors but worse results on others.

Newer processors are not necessarily backward compatible with the older ones when it comes to their architecture having an effect on how instructions are processed.

And you also get differences between architectures from different manyfacturers to compound the problem.

On top of that, some people occasionally waste their time and energy trying to save a few nanoseconds when the result will then stay on display for a relative eternity waiting for user input.

Raymond
Posted on 2004-03-27 11:33:48 by Raymond
Hello,



__dwtohex proc USES EDI num: DWORD, buffer: DWORD

MOV EAX, num
BSWAP EAX
MOV EDI, buffer
MOV ECX, 4
INC EDI

@@loop:
MOV EDX, EAX
AND EDX, 0Fh
ADD EDX, '0'
CMP EDX, '9'
JBE @F
ADD EDX, 'A' - '9' - 1

@@:
MOV [EDI], DL
DEC EDI

SHR EAX, 4
MOV EDX, EAX
AND EDX, 0Fh
ADD EDX, '0'
CMP EDX, '9'
JBE @F
ADD EDX, 'A' - '9' - 1

@@:
MOV [EDI], DL

ADD EDI, 3
SHR EAX, 4
DEC ECX
JNZ @@loop

DEC EDI
MOV [EDI], BYTE PTR 0

ret
__dwtohex endp


It is 30% faster than the first version (on PIV).

MfG Manuel.
Posted on 2004-03-27 11:58:40 by other
^ Ditto what Raymond said.
Posted on 2004-03-27 12:32:43 by iblis
Hello,


^ Ditto what Raymond said.


Ditto?
Yes i know, that there are different results on different processors.

Here my last version:



_dwtohex2 proc USES EDI num: DWORD, buffer: DWORD

MOV EAX, num
MOV EDI, buffer
MOV ECX, 3
MOV [EDI+8], BYTE PTR 0

@@loop:
MOV EDX, EAX
AND EDX, 0Fh
ADD EDX, '0'
CMP EDX, '9'
JBE @F
ADD EDX, 'A' - '9' - 1

@@:
MOV [EDI+ECX*2+1], DL

SHR EAX, 4
MOV EDX, EAX
AND EDX, 0Fh
ADD EDX, '0'
CMP EDX, '9'
JBE @F
ADD EDX, 'A' - '9' - 1

@@:
MOV [EDI+ECX*2], DL
SHR EAX, 4
DEC ECX
JNS @@loop

ret
_dwtohex2 endp


This is finally ca. 15 times faster than the version in the masm32.lib.


MfG Manuel.
Posted on 2004-03-27 12:47:15 by other
other,

"..and testet it against the version in the masm-lib.
It seems up to 10 times faster on my PIV."


Most of them are written by HLL pseudo programmers,
hence to beat them isn't a big deal

"..This is finally ca. 15 times faster than the version in the masm32.lib."

Your algo is too short to be fast...
Try that:
OPTION PROLOGUE:NONE                        ; turn it off

OPTION EPILOGUE:NONE ;
Align 16 ;
Dwtohex3 proc num: DWORD, lpBuffer: DWORD
mov eax, [esp+2*4] ; eax->lpBufer
mov ecx, [esp+1*4] ; ecx->num
add eax, 8 ; eax-> end of the string
mov edx, [esp+0*4] ; edx->return address
mov [esp+2*4], edx ; save return address
mov edx, ecx ; ecx=edx=num
mov [esp+1*4], esi ; save esi
add esp, 1*4
mov byte ptr [eax], 0 ; zero ended string
Dwloop:
and edx, 0Fh
add eax, -1
cmp edx, 0Ah
sbb esi, esi
add edx, 37h
and esi, 7
sub edx, esi
shr ecx, 4
mov [eax], dl
mov edx, ecx
jnz Dwloop
pop esi ; restore esi
ret ; eax-> starting string's address
Dwtohex3 endp ; without leading zeroes
OPTION PROLOGUE:PROLOGUEDEF ; turn back on the defaults
OPTION EPILOGUE:EPILOGUEDEF


Regards,
Lingo
Posted on 2004-03-27 18:41:04 by lingo12
Hello,


other,

Most of them are written by HLL pseudo programmers,
hence to beat them isn't a big deal


:-). Sorry, but i haven't known that.
I thought, that the Algos in the masm32.lib are very fast.


"..This is finally ca. 15 times faster than the version in the masm32.lib."

Your algo is too short to be fast...


:-)


Try that:

...


Your algos is ca 300% faster then my last one. Very Nice :-).


MfG Manuel.
Posted on 2004-03-28 00:56:55 by other
Hi,
How to know the procedure is 10 times faster, 300% faster...
Which monitor program should I used and where can I get it.

TQ
Posted on 2004-03-28 02:18:45 by QS_Ong
Hello,

Originally posted by lingo12
Your algo is too short to be fast...
Try that:
OPTION PROLOGUE:NONE                        ; turn it off

OPTION EPILOGUE:NONE ;
Align 16 ;
Dwtohex3 proc num: DWORD, lpBuffer: DWORD
mov eax, [esp+2*4] ; eax->lpBufer
mov ecx, [esp+1*4] ; ecx->num
add eax, 8 ; eax-> end of the string
mov edx, [esp+0*4] ; edx->return address
mov [esp+2*4], edx ; save return address
mov edx, ecx ; ecx=edx=num
mov [esp+1*4], esi ; save esi
add esp, 1*4
mov byte ptr [eax], 0 ; zero ended string
Dwloop:
and edx, 0Fh
add eax, -1
cmp edx, 0Ah
sbb esi, esi
add edx, 37h
and esi, 7
sub edx, esi
shr ecx, 4
mov [eax], dl
mov edx, ecx
jnz Dwloop
pop esi ; restore esi
ret ; eax-> starting string's address
Dwtohex3 endp ; without leading zeroes
OPTION PROLOGUE:PROLOGUEDEF ; turn back on the defaults
OPTION EPILOGUE:EPILOGUEDEF


Regards,
Lingo


Is it possible, that this algorithmus contains errors?



shr ecx, 4
mov [eax], dl
mov edx, ecx
jnz Dwloop


0x00001234

After the pointer (eax) reaches 1 then you leave the loop.


Or i am tired to overlook a important location?

MfG Manuel.
Posted on 2004-03-28 02:52:56 by other
Hello,



How to know the procedure is 10 times faster, 300% faster...


I use QueryPerformanceCounter(RTDSC?) -> Win32->API.


MfG Mnauel.
Posted on 2004-03-28 02:54:10 by other
Hello other
I agree that there are substantial differences on each machine were the test is performed, but another important point is how the code performance is measured.

I tested last proposed alternative procedures for ?dw2hex? on my machine and found that a little unrolling helped to achieve a bit more speed using this code:



MakeHexChar macro reg:req, pos:req
cmp reg, "9"
jbe @F
add reg, 7
@@:
mov byte ptr [eax + pos], reg
endm

dwtohex proc dNum: DWORD, pBuffer: DWORD
mov edx, dNum
mov ecx, edx
shr edx, 4

and edx, 0F0F0F0Fh
and ecx, 0F0F0F0Fh
add edx, "0000"
add ecx, "0000"

mov eax, pBuffer

MakeHexChar cl, 7
MakeHexChar dl, 6
MakeHexChar ch, 5
MakeHexChar dh, 4
shr ecx, 16
shr edx, 16
MakeHexChar cl, 3
MakeHexChar dl, 2
MakeHexChar ch, 1
MakeHexChar dh, 0
mov byte ptr [eax + 8], 0

ret
dwtohex endp


Can you test it on your machine?

Regards,

Biterider
Posted on 2004-03-28 03:01:27 by Biterider
Hello,



I agree that there are substantial differences on each machine were the test is performed, but another important point is how the code performance is measured.

I tested last proposed alternative procedures for ?dw2hex? on my machine and found that a little unrolling helped to achieve a bit more speed using this code:


Funn, because i want tried to write a simliar code in a few minutes :-).



dwtohex2 proc num: DWORD, buffer: DWORD

MOV EAX, num
MOV EDX, buffer
MOV BYTE PTR [EDX+8], 0

MOV ECX, EAX
SHR EAX, 4
AND EAX, 0F0F0F0F0h
AND ECX, 0F0F0F0F0h

ADD EAX, 30303030h
ADD ECX, 30303030h

...

ret
dwtohex2 endp


Now i have tried to find a fast way, to convert to hex.




MakeHexChar macro reg:req, pos:req
cmp reg, "9"
jbe @F
add reg, 7
@@:
mov byte ptr [eax + pos], reg
endm

dwtohex proc dNum: DWORD, pBuffer: DWORD
mov edx, dNum
mov ecx, edx
shr edx, 4

and edx, 0F0F0F0Fh
and ecx, 0F0F0F0Fh
add edx, "0000"
add ecx, "0000"

mov eax, pBuffer

MakeHexChar cl, 7
MakeHexChar dl, 6
MakeHexChar ch, 5
MakeHexChar dh, 4
shr ecx, 16
shr edx, 16
MakeHexChar cl, 3
MakeHexChar dl, 2
MakeHexChar ch, 1
MakeHexChar dh, 0
mov byte ptr [eax + 8], 0

ret
dwtohex endp


Can you test it on your machine?


Yes and it is very fast.
Ca. 30% faster as my last algo.

Nice work.

MfG Manuel.
Posted on 2004-03-28 04:10:46 by other
Hallo,

;) ... long is beautiful ;)




dwtohex2 proc num: DWORD, buffer: DWORD

PUSH EBX
PUSH ESI

MOV EDX, buffer
MOV EAX, num
MOV BYTE PTR [EDX+8], 0

MOV ECX, EAX

SHR EAX, 4
AND EAX, 0F0F0F0Fh
AND ECX, 0F0F0F0Fh

ADD EAX, 37373737h
ADD ECX, 37373737h

; -- --------------

MOV EBX, EAX
AND EBX, 40404040h

MOV ESI, EBX
SHR ESI, 4
SHR EBX, 5
OR ESI, EBX
SHR EBX, 1
OR ESI, EBX

SUB EAX, 07070707h
ADD EAX, ESI

; ----------------

MOV EBX, ECX
AND EBX, 40404040h

MOV ESI, EBX
SHR ESI, 4
SHR EBX, 5
OR ESI, EBX
SHR EBX, 1
OR ESI, EBX

SUB ECX, 07070707h
ADD ECX, ESI

; ----------------

MOV BL, AL
MOV BH, CL

SHL EBX, 16

MOV BL, AH
MOV BH, CH

MOV [EDX+4], EBX

; ----------------

SHR EAX, 16
SHR ECX, 16

; ----------------

MOV BL, AL
MOV BH, CL

SHL EBX, 16

MOV BL, AH
MOV BH, CH

MOV [EDX], EBX

PUSH ESI
POP EBX

ret
dwtohex2 endp



MfG Manuel.
Posted on 2004-03-28 06:02:08 by other
Hello other
Notice the bug in the last lines of your code (push pop should be pop pop).
I tested it on my machine and don?t achieve any improvement. I fall down to the performance of the first routines. How are the results on your machine? Have you considered to use the rdtsc instruction for performance measuring?

Regards,

Biterider
Posted on 2004-03-28 06:50:50 by Biterider
Hello,

Originally posted by Biterider
Hello other
Notice the bug in the last lines of your code (push pop should be pop pop).


Thank you.


I tested it on my machine and don?t achieve any improvement. I fall down to the performance of the first routines. How are the results on your machine? Have you considered to use the rdtsc instruction for performance measuring?


It is a slower on my machine than yours (ca. 20%) but it can extremely optimized.
I have some ideas, but don't know how i put in pratice.


MfG Manuel.
Posted on 2004-03-28 06:58:47 by other
other,

Is it possible, that this algorithmus contains errors?

No

0x00001234
After the pointer (eax) reaches 1 then you leave the loop.


Yes

I dont't need leading zeroes in the final string
[see the comment for eax-> at the last line)

Regards,
Lingo
Posted on 2004-03-28 08:11:01 by lingo12
Sorry,


0x00001234
After the pointer (eax) reaches 1 then you leave the loop.

Yes

I dont't need leading zeroes in the final string


But i haven't read the comment :-(.


MfG Manuel.
Posted on 2004-03-28 08:34:30 by other
Hello,



dwtohexf proc dwNumber: DWORD, buffer: DWORD

PUSH EBX

MOV EBX, dwNumber
MOV ECX, EBX

SHR EBX, 4
AND ECX, 0F0F0F0Fh
AND EBX, 0F0F0F0Fh ; mask hex-digit
MOV EAX, ECX
MOV EDX, EBX

ADD EDX, 80808080h - 0A0A0A0Ah ; build mask to discern digit > 9
ADD EAX, 80808080h - 0A0A0A0Ah
SHR EDX, 4
SHR EAX, 4
NOT EDX
NOT EAX

AND EDX, 07070707h ; mask digit > 9 ... mask = 0111
AND EAX, 07070707h
ADD EDX, EBX ; add 'A' - '9' if digit > 9
ADD EAX, ECX
ADD EDX, 30303030h ; add ascii '0'
ADD EAX, 30303030h

MOV EBX, buffer
MOV BYTE PTR [EBX + 8], 0
; <-- not optimized swap ( there are surly better wasy ) ---> ;
MOV CH, AL
MOV CL, DL
SHL ECX, 16
MOV CH, AH
MOV CL, DH
MOV [EBX+4], ECX

SHR EAX, 16
SHR EDX, 16

MOV CH, AL
MOV CL, DL
SHL ECX, 16
MOV CH, AH
MOV CL, DH
MOV [EBX+0], ECX

POP EBX

ret
dwtohexf endp


The swap is ... ;)


MfG Manuel.
Posted on 2004-03-28 14:39:24 by other