Function converts binary number of length BITSIZE into a hex string. BITSIZE must be a multiple of 64 because the function converts 8 bytes at a time. Reads in binary number from in1 and writes to an ASCII string at out1. If in1 and out1 are aligned on a DWORD boundary, it can convert 8 bytes in about 13 clocks. Anybody wanna help whip it into shape? thanks :)

BToS proc in1:DWORD,out1:DWORD
mov esi,in1
mov edi,out1

movq MM7,[BToSC1]
movq MM6,[BToSC1+8]

movq MM5,[BToSC1+16]
movq MM4,[BToSC1+24]

mov ecx,BITSIZE/64

movq MM0,[esi]

movq MM1,MM0
movq MM2,MM0 ;MM2 will store high-nibble of each byte

pand MM0,MM6 ;Mask for low nibble
psrlq MM2,4 ;Preshift high-nibbles into low-nibbles

pcmpgtb MM0,MM4 ;Compare. Is low nibble>9?
pand MM1,MM6 ;Mask for low nibble in MM1 <-room for optimisation here?

pand MM0,MM5 ;Mask nibble to convert to ASCII
paddb MM1,MM7 ;Add 30h to each byte

paddb MM0,MM1 ;Add Alpha-offset to nibbles>9
movq MM3,MM2 ;Make second copy of shifted high-nibbles

pand MM2,MM6 ;Mask shifted h-nibs
pand MM3,MM6 ;Mask shifted h-nibs

pcmpgtb MM2,MM4 ;Compare. Is shn>9?
paddb MM3,MM7 ;everything is same as above, except for now using shn
;instead of low nib

pand MM2,MM5
movq MM1,MM0 ;Make second copy of MM0 for unpacking

paddb MM3,MM2
movq MM2,MM3 ;Make second copy of MM3 for unpacking

punpcklbw MM2,MM0 ;Unpack
punpckhbw MM3,MM1

movq [edi],MM2 ;Write
movq [edi+8],MM3

add esi,8 ;Increment source addr
add edi,16 ;Increment dest addr

dec ecx ;Next loop
jne @B

BToSC1 dq 03030303030303030h
BToSC2 dq 00F0F0F0F0F0F0F0Fh
BToSC3 dq 00707070707070707h
BToSC4 dq 00909090909090909h
BToS endp
Posted on 2002-03-30 16:31:34 by jademtech
With the right alignment and data caching, this algo will execute in 9 cycles per loop on an Athlon. IIRC, the Pentiums took 12 cycles. ;)
	mov eax,src

mov edx,dst
mov ecx,bytes
movq mm4, mxc(<0F>)
movq mm5, mxc(<30>)
movq mm6, mxc(<39>)
; movq mm7, mxc(<D8>) ; lowercase
movq mm7, mxc(<F8>) ; uppercase
shr ecx,3
@@: movq mm0,[eax]
add eax,8
movq mm1,mm0
psrlq mm0,4
pand mm1,mm4
pand mm0,mm4
paddb mm1,mm5
paddb mm0,mm5
movq mm3,mm1
movq mm2,mm0
pcmpgtb mm3,mm6
pcmpgtb mm2,mm6
psubusb mm3,mm7
psubusb mm2,mm7
paddb mm1,mm3
paddb mm0,mm2
movq mm2,mm0
add edx,16
punpckhbw mm0,mm1
punpcklbw mm2,mm1
movntq [edx + 8],mm0
movntq [edx],mm2
dec ecx
jg loop16
Posted on 2002-03-30 17:19:01 by bitRAKE
thanks, again, bitRAKE :)

*scratches head*
all i have to do now is figure out how it works ;) shouldn't be a prob
Posted on 2002-03-30 17:21:12 by jademtech

mxc(<39> )

what is "mxc"? masm doesn't like it. nor does it like movntq
Posted on 2002-03-30 17:24:21 by jademtech
Posted on 2002-03-30 17:28:23 by stryker
mxc is a macro (thanks, stryker).
movq mm0, mxc(<01>) ; move eight 01 bytes to register

movq mm0, mxc(<0123>) ; move four 0123 words to register
movq mm0, mxc(<01234567>) ; move two dwords to register
It is basically a shortcut way to define constants inline with the code - instead of using a bunch of .CODE/.DATA directives. Keeping the data where the code it makes editing easier, IMO.

MOVNTQ only works on P$/Athlon - the instruction stores data to memory by-passing the cache - change it to MOVQ for other processors.
Posted on 2002-03-30 17:32:23 by bitRAKE
thanks, bitRAKE and stryker :)
Posted on 2002-03-30 17:33:42 by jademtech