hrm donkey, you say the routine aligns itself before starting? In that case it should never fail, I guess. Could probably test it by using VirtualProtect and stuff, but... oh well ;)
Posted on 2005-07-13 18:57:45 by f0dder

Remember that the scan aligns itself by reading the first few bytes up to a DWORD boundary then it begins the DWORD read so it will always end on a DWORD.

Now that's what I've been missing :lol: in that case you're absolutely right. :)
Posted on 2005-07-13 18:58:38 by QvasiModo
Strings should be aligned on 16byte blocks
Or better yet 64bytes so we can find string len with SSE in like 1/10th of the time it takes.

When I get my 64bit system, I'm thinking about making a heavily optimized String class for large chunks of data. If I code the class in ASM I'm not quite sure how I'd get it to work with C/C++ I guess it's a linking issue or dll2lib sort of thing OR would I code the functionality in ASM and then make a wrapper class for those functions in C++.

2cents on whatever the topic of this thread is.
If it's a dynamically created string, using VirtualAlloc then it's aligned on 64byte memory addressing, I'm not sure about HeapAlloc (i'd guess not the same at all).
Posted on 2005-07-14 11:35:28 by r22

2cents on whatever the topic of this thread is.
If it's a dynamically created string, using VirtualAlloc then it's aligned on 64byte memory addressing, I'm not sure about HeapAlloc (i'd guess not the same at all).

HeapAlloc blocks are aligned to 16 bytes if I'm not wrong. VirtualAlloc memory is aligned to 4096 bytes since it works directly on pages.
Posted on 2005-07-14 12:17:28 by QvasiModo

Or better yet 64bytes so we can find string len with SSE in like 1/10th of the time it takes.

That's silly if you're dealing with "normal" string lengths... there's probably so much overhead with the SSE(2) algorithm that you'll ned pretty long strings before it gains you any benefits...
Posted on 2005-07-14 12:43:48 by f0dder
I was half joking
BUT for the sake of doing something


align 16
strLenAlign16SSE:
        mov ecx,
        movdqa xmm2,dqword
        lea eax,
        movdqa xmm0,dqword
    .lp:
        movdqa xmm1,xmm0
        pxor xmm0,xmm2    ;xor -1
        paddb xmm1,xmm2    ;sub 1
        movdqa xmm3,  ;used for unroll
        pand xmm0,xmm1
        pmovmskb edx,xmm0
        add eax,16
        test dx,-1 ;1111 1111 1111 1111b
        jnz .unrol
        movdqa xmm1,xmm3
        pxor xmm3,xmm2    ;xor -1
        paddb xmm1,xmm2    ;sub 1
        pand xmm3,xmm1
        movdqa xmm0,  ;back to first roll
        pmovmskb edx,xmm3
        add eax,16
        test dx,-1 ;1111 1111 1111 1111b
        jz .lp
    .unrol:
        add ecx,32
        sub eax,ecx
        xor ecx,ecx
        sub ecx,edx
        and edx,ecx
        CVTSI2SD xmm0,edx
        PEXTRW edx,xmm0,3
        shr dx,4
        add dx,0fc01h
        ;          bsf edx,edx replaced by crazy SSE version
        add eax,edx
        ret 4
align 16
filled dq 0FFFFFFFFFFFFFFFFh,0FFFFFFFFFFFFFFFFh


THAT'S a string len function !
For 16byte aligned strings (the unroll doesn't really improve the speed I just wanted to make the code look BIGGER and MEANER :P).

There's probably an optimization I'm missing, the 64byte aligned one would give the greatest speed up (reading 64bytes ahead, and doing the check ANDs and MOVmskb at the end of the loop.


----------RANT OVER---------
Posted on 2005-07-14 16:44:15 by r22
I think that you will find that MMX also works well, but the problem is that it is actually slower for small or misaligned strings. The penalty is actually quite large so for small strings I will generally use a DWORD scan, for the rare occasion that I am sure all strings will be greater than 128 bytes I use MMX...

.code
szLenMMX FRAME pString

mov eax,
nop
nop ; fill in stack frame+mov to 8 bytes

pxor mm0,mm0
nop ; fill pxor to 4 bytes
pxor mm1,mm1
nop ; fill pxor to 4 bytes

: ; this is aligned to 16 bytes
movq mm0,
pcmpeqb mm0,mm1
add eax,8
pmovmskb ecx,mm0
or ecx,ecx
jz <

sub eax,

bsf ecx,ecx
sub eax,8
add eax,ecx

emms


  RET

ENDF
Posted on 2005-07-14 18:10:12 by donkey
Hi
Beside another things, this article mentions the heap granularity (8 bytes). This makes Donkey?s string length measuring algo completely safe, provided that the string is located in a heap?

http://www.maxpatrol.com/defeating-xpsp2-heap-protection.htm

Biterider
Posted on 2005-07-30 13:31:51 by Biterider
I found in the MSDN a note about the heap granularity. Here it is 16 bytes... ???

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dngenlib/html/msdn_heapmm.asp

Biterider
Posted on 2005-08-09 09:49:55 by Biterider
Not everything is located on the heap :)
Posted on 2005-08-09 12:12:33 by f0dder
Regardless of granularity or the allocation method, the algorithm will not scan past the DWORD in which the last byte (NULL) is written and since it is impossible to protect memory on a byte by byte basis in the 32 bit world, or such a thing as "write only", it can never generate a read fault.
Posted on 2005-08-09 19:36:34 by donkey