Hello other
I confirm your result about speed improvement on another machine (P4-2.8GHz). The speed factor is ca. 30 times faster.

I also test your new proposal on this machine, but it is around 16% slower than the ?proposal with the macro?. I think, some optimization can bring the necessary speed, since the approach you bring is very interesting. :alright:

Regards,
Biterider
Posted on 2004-03-28 23:03:27 by Biterider
Hello other
A little speed improvement can be achieved with the following code:



align 4
dwtohex4 proc uses edi dNumber: dword, pBuffer: dword
mov edx, dNumber
mov ecx, edx
shr edx, 4
and edx, 0F0F0F0Fh
and ecx, 0F0F0F0Fh

mov eax, edx
mov edi, ecx

add edx, 80808080h - 0A0A0A0Ah ; Build mask to discern digit > 9
add ecx, 80808080h - 0A0A0A0Ah
shr edx, 4
shr ecx, 4
not edx
not ecx
and edx, 07070707h ; Mask digit > 9 ... mask = 0111
and ecx, 07070707h
add edx, eax ; Add 'A' - '9' if digit > 9
add ecx, edi
add edx, 30303030h ; Add ascii '0'
add ecx, 30303030h

mov edi, pBuffer ; Using edi is faster
mov byte ptr [edi + 7], cl
mov byte ptr [edi + 6], dl
mov byte ptr [edi + 5], ch
mov byte ptr [edi + 4], dh
shr ecx, 16
shr edx, 16
mov byte ptr [edi + 3], cl
mov byte ptr [edi + 2], dl
mov byte ptr [edi + 1], ch
mov byte ptr [edi + 0], dh
mov byte ptr [edi + 8], 0

ret
dwtohex4 endp


The result shows that the proc with the macro is still the fastest :grin:

Regards,

Biterider
Posted on 2004-03-29 00:13:35 by Biterider
Hallo,


Hello other
A little speed improvement can be achieved with the following code:



I will test it at home :-).




The result shows that the proc with the macro is still the fastest :grin:


Yes, it is 5-10% faster on my PIV ( 2.6 GHz ).

I thought elimate every jmp makes the algo faster than yours. But it fails :-).


MfG Manuel.
Posted on 2004-03-29 05:24:56 by other
Hallo,

your optimized algo



align 4
dwtohex4 proc uses edi dNumber: dword, pBuffer: dword
mov edx, dNumber
mov ecx, edx
shr edx, 4
and edx, 0F0F0F0Fh
and ecx, 0F0F0F0Fh


is 30% faster on a P3 ( 700 MHz ) than your first with macro.

:-)


MfG Manuel.
Posted on 2004-03-29 05:40:25 by other

by lingo
Your algo is too short to be fast...
Try that:

For what machine it was tested?
I'm asking 'cause I've tried other's proc you posted yours against. And yours took 115 clocks. other's 52.

ps. fastadw 30 clocks...
later procs using simular masking as fastadw, so I think,
after carefull coding they could finally beat it.
I didn't tauch the proc for years it was written for PMMX arcitecture long way back...
Posted on 2004-03-31 14:15:44 by The Svin