What is the best algo to find the first position, in a big array of bytes, of a following number of 0 ? each byte is 1 or 0.

array = [1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,11,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0]

i want to find the first position of : 0,0,0,0,0,0,0,0,0

Thanks for your help.
Posted on 2002-07-25 06:46:42 by DarkEmpire
Boyer Moore?


but as the block to search is relatively small and all the bytes at are zero, some better solutions may exist...
Posted on 2002-07-25 07:09:46 by JCP

lea edx, array

inc edx
cmp DWORD PTR [edx - 1], 0
jnz @B

add edx, 4
cmp DWORD PTR [edx - 1], 0
jnz @B

add edx, 3
cmp BYTE PTR [edx], 0
jne @B

This is my current thinking... It's not too flexible though, say if you wanted to search for say 12 consecutive zeros you need to re-write it (and it gets silly the amount of code needed for more that 16 consecutive zeros).

This algo will go faster the more quads of zeros are present in the list, I felt that the effort involved in calculating where the first non-zero was would out-weigh a simple method (that and I am a lazy b*stard).

It makes no attempt to error check for the end of the array.


------- fixed code - left in ecx from my notepad scratchings -------
Posted on 2002-07-25 07:12:28 by Mirno
If you are going to regularly search for a long block of zeros in a pattern of this type, I would be inclined to try one of the variant boyer moore algos in MASM32, in particular the one that does not use the bad character shift, it is called a "simplified" boyer moore and may work ok in this situation.

With mixed patterns of 0s and 1s, any boyer moore will be too slow and you may have to look at the type of search algos that are used in dna searches. A good byte scanner will do OK but I am sure there is a faster way to do it but the code may be complex.


Posted on 2002-07-25 08:30:07 by hutch--
I think this is a reasonable solution:

lea edi,Array
@@Top: cmp BYTE PTR [edi],0
jne @@Bottom
mov eax,edi
@@: inc edi
cmp BYTE PTR [edi],0
je @B
@@: dec eax
cmp BYTE PTR [eax],0
je @B
sub eax,edi
neg eax
dec eax
cmp eax,NUM_ZEROES
je @@FoundIt
@@Bottom: add edi,NUM_ZEROES
jmp @@Top
@@FoundIt: ...

Basic idea:
--jump by NUM_ZEROES at a time
--when you find a zero, you might be in the middle of a sequence: so run edi up to the first non-zero byte and run eax down to the first preceding non-zero byte
--the value edi-eax-1 is the length of the run of bytes
--if it isn't the same amount, then we've already moved edi up to the first non-zero byte and we don't need to test it so we just add NUM_ZEROES and go back to the top

Couple of notes:
--you might wanna check that you haven't run off the end of the array (ie., replace the jmp @@Top on the last line with something else
--If you wanna check for *at least* NUM_ZEROES just change the cmp eax,NUM_ZEROES/je @@FoundIt to a jge @@FoundIt


Oh yeah, I forgot. The first byte of the sequence of course is edi-eax
Posted on 2002-07-25 17:36:12 by chorus

INCLUDE \masm32\INCLUDE\windows.inc
INCLUDE \masm32\INCLUDE\kernel32.inc
INCLUDE \masm32\INCLUDE\user32.inc
INCLUDELIB \masm32\lib\user32.lib
INCLUDELIB \masm32\lib\kernel32.lib
INCLUDE \masm32\INCLUDE\masm32.inc
INCLUDELIB \masm32\lib\masm32.lib


TextData DB 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
TextPattern DB 0, 0, 1


buffer DB 16 DUP(?)


bssearch PROC USES ebx esi edi lpSource:DWORD, uSourceLen:DWORD, lpPattern:DWORD, uPatternLen:DWORD

mov eax, uPatternLen
cmp eax, uSourceLen
ja __not_found

xor eax, eax
mov esi, lpSource
dec eax
xor ebx, ebx
mov edi, lpPattern
xor ecx, ecx

push 0


cmp ecx, uPatternLen
je __stop
mov dl, BYTE PTR [edi+ecx]
cmp ebx, uSourceLen
je __stop
cmp dl, BYTE PTR [esi+ebx]
jne __byte_not_equal

test eax, eax
jns __chain_started

mov eax, ebx


test ecx, ecx
jz __move_forward

cmp DWORD PTR [esp], 0
jne __move_forward

mov dl, BYTE PTR [edi]
cmp dl, BYTE PTR [esi+ebx]
jne __move_forward

mov DWORD PTR [esp], ebx


inc ecx
inc ebx
jmp __start


xor eax, eax
cmp DWORD PTR [esp], 0
je __one_step_forward

mov ebx, DWORD PTR [esp]
mov DWORD PTR [esp], eax
jmp __reset


inc ebx


xor ecx, ecx
dec eax
jmp __start


pop edx

cmp ecx, uPatternLen
jne __not_found


xor eax, eax
dec eax


bssearch ENDP


invoke bssearch, OFFSET TextData, SIZEOF TextData, OFFSET TextPattern, SIZEOF TextPattern
invoke dwtoa, eax, OFFSET buffer
invoke MessageBox, 0, OFFSET buffer, 0, 0
invoke ExitProcess,NULL

END START[/size]

lpSource = pointer to the byte array source
uSourceLen = size of byte array source
lpPattern = pointer to the byte array pattern
uPatternLen = size of byte array pattern

Return Values:

-1 = pattern not found
n = nth position of the pattern


This is an idiot proof (I hope) generalized byte scanner search algorithm. This algorithm is for byte sizes only but you can modify it for other array size.

Have Fun!!! :grin:
Posted on 2002-07-25 20:40:39 by stryker
Since your looking for 9 zero bytes, then there is going to be one dword that is all zero. ;) Look for a zero dword and then for your nine zero bytes. It will be four times faster than anything posted thus far.
Posted on 2002-07-25 20:46:36 by bitRAKE
DarkEmpire, Just use the BM Algo and modify it to handle byte size arrays since you said
...in a big array of bytes...
Also I don't assume you only want to search the 0 values don't you? ;)
Posted on 2002-07-25 20:52:27 by stryker
Actually BitRake, I don't think that'd be as fast as the algo I posted, let alone four times faster. I check every 9 bytes or so, your method would check all bytes, but 4 at a time.

Posted on 2002-07-25 21:20:00 by chorus
chorus, sorry - I didn't look close enough at your code. :o
Posted on 2002-07-25 21:37:33 by bitRAKE
Thank you very much :)
Posted on 2002-07-25 22:32:39 by chorus
Also I don't assume you only want to search the 0 values don't you?

yes, i want to search the 0 values, in fact, all others bytes can be different (not only 1), but i only search 0.

Is there a special tip if the array is a size like 100 000 bytes or 1 000 000 bytes ?

Thanks for your help
Posted on 2002-07-26 01:46:08 by DarkEmpire
Bitrake, I checked DWORDs!
I use the DWORD check to check for zeros en-mass!

I also thought of this code:

lea edx, array

mov ecx, 9
inc edx
cmp BYTE PTR [edx - 1], 1
jne first
sbb ecx, 0
jnz @B

; Found

I guess the algo you want depends of the average case you'll be dealing with.
I suppose someone will have to benchmark them, but that isn't going to be me because I am at work!

Posted on 2002-07-26 04:01:06 by Mirno