Hi,
What is the best algo to find the first position, in a big array of bytes, of a following number of 0 ? each byte is 1 or 0.
exemple:
array = [1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,11,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0]
i want to find the first position of : 0,0,0,0,0,0,0,0,0
Thanks for your help.
What is the best algo to find the first position, in a big array of bytes, of a following number of 0 ? each byte is 1 or 0.
exemple:
array = [1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,11,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0]
i want to find the first position of : 0,0,0,0,0,0,0,0,0
Thanks for your help.
Boyer Moore?
http://www.movsd.com
but as the block to search is relatively small and all the bytes at are zero, some better solutions may exist...
http://www.movsd.com
but as the block to search is relatively small and all the bytes at are zero, some better solutions may exist...
lea edx, array
@@:
inc edx
cmp DWORD PTR [edx - 1], 0
jnz @B
add edx, 4
cmp DWORD PTR [edx - 1], 0
jnz @B
add edx, 3
cmp BYTE PTR [edx], 0
jne @B
This is my current thinking... It's not too flexible though, say if you wanted to search for say 12 consecutive zeros you need to re-write it (and it gets silly the amount of code needed for more that 16 consecutive zeros).
This algo will go faster the more quads of zeros are present in the list, I felt that the effort involved in calculating where the first non-zero was would out-weigh a simple method (that and I am a lazy b*stard).
It makes no attempt to error check for the end of the array.
Mirno
------- fixed code - left in ecx from my notepad scratchings -------
If you are going to regularly search for a long block of zeros in a pattern of this type, I would be inclined to try one of the variant boyer moore algos in MASM32, in particular the one that does not use the bad character shift, it is called a "simplified" boyer moore and may work ok in this situation.
With mixed patterns of 0s and 1s, any boyer moore will be too slow and you may have to look at the type of search algos that are used in dna searches. A good byte scanner will do OK but I am sure there is a faster way to do it but the code may be complex.
Regards,
hutch@movsd.com
With mixed patterns of 0s and 1s, any boyer moore will be too slow and you may have to look at the type of search algos that are used in dna searches. A good byte scanner will do OK but I am sure there is a faster way to do it but the code may be complex.
Regards,
hutch@movsd.com
I think this is a reasonable solution:
Basic idea:
--jump by NUM_ZEROES at a time
--when you find a zero, you might be in the middle of a sequence: so run edi up to the first non-zero byte and run eax down to the first preceding non-zero byte
--the value edi-eax-1 is the length of the run of bytes
--if it isn't the same amount, then we've already moved edi up to the first non-zero byte and we don't need to test it so we just add NUM_ZEROES and go back to the top
Couple of notes:
--you might wanna check that you haven't run off the end of the array (ie., replace the jmp @@Top on the last line with something else
--If you wanna check for *at least* NUM_ZEROES just change the cmp eax,NUM_ZEROES/je @@FoundIt to a jge @@FoundIt
--Chorus
-edit-
Oh yeah, I forgot. The first byte of the sequence of course is edi-eax
lea edi,Array
@@Top: cmp BYTE PTR [edi],0
jne @@Bottom
mov eax,edi
@@: inc edi
cmp BYTE PTR [edi],0
je @B
@@: dec eax
cmp BYTE PTR [eax],0
je @B
sub eax,edi
neg eax
dec eax
cmp eax,NUM_ZEROES
je @@FoundIt
@@Bottom: add edi,NUM_ZEROES
jmp @@Top
@@FoundIt: ...
Basic idea:
--jump by NUM_ZEROES at a time
--when you find a zero, you might be in the middle of a sequence: so run edi up to the first non-zero byte and run eax down to the first preceding non-zero byte
--the value edi-eax-1 is the length of the run of bytes
--if it isn't the same amount, then we've already moved edi up to the first non-zero byte and we don't need to test it so we just add NUM_ZEROES and go back to the top
Couple of notes:
--you might wanna check that you haven't run off the end of the array (ie., replace the jmp @@Top on the last line with something else
--If you wanna check for *at least* NUM_ZEROES just change the cmp eax,NUM_ZEROES/je @@FoundIt to a jge @@FoundIt
--Chorus
-edit-
Oh yeah, I forgot. The first byte of the sequence of course is edi-eax
[size=9].686
.MODEL FLAT, STDCALL
OPTION CASEMAP:NONE
INCLUDE \masm32\INCLUDE\windows.inc
INCLUDE \masm32\INCLUDE\kernel32.inc
INCLUDE \masm32\INCLUDE\user32.inc
INCLUDELIB \masm32\lib\user32.lib
INCLUDELIB \masm32\lib\kernel32.lib
INCLUDE \masm32\INCLUDE\masm32.inc
INCLUDELIB \masm32\lib\masm32.lib
.DATA
TextData DB 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
TextPattern DB 0, 0, 1
.DATA?
buffer DB 16 DUP(?)
.CODE
bssearch PROC USES ebx esi edi lpSource:DWORD, uSourceLen:DWORD, lpPattern:DWORD, uPatternLen:DWORD
mov eax, uPatternLen
cmp eax, uSourceLen
ja __not_found
xor eax, eax
mov esi, lpSource
dec eax
xor ebx, ebx
mov edi, lpPattern
xor ecx, ecx
push 0
__start:
cmp ecx, uPatternLen
je __stop
mov dl, BYTE PTR [edi+ecx]
cmp ebx, uSourceLen
je __stop
cmp dl, BYTE PTR [esi+ebx]
jne __byte_not_equal
test eax, eax
jns __chain_started
mov eax, ebx
__chain_started:
test ecx, ecx
jz __move_forward
cmp DWORD PTR [esp], 0
jne __move_forward
mov dl, BYTE PTR [edi]
cmp dl, BYTE PTR [esi+ebx]
jne __move_forward
mov DWORD PTR [esp], ebx
__move_forward:
inc ecx
inc ebx
jmp __start
__byte_not_equal:
xor eax, eax
cmp DWORD PTR [esp], 0
je __one_step_forward
mov ebx, DWORD PTR [esp]
mov DWORD PTR [esp], eax
jmp __reset
__one_step_forward:
inc ebx
__reset:
xor ecx, ecx
dec eax
jmp __start
__stop:
pop edx
cmp ecx, uPatternLen
jne __not_found
ret
__not_found:
xor eax, eax
dec eax
ret
bssearch ENDP
START:
invoke bssearch, OFFSET TextData, SIZEOF TextData, OFFSET TextPattern, SIZEOF TextPattern
invoke dwtoa, eax, OFFSET buffer
invoke MessageBox, 0, OFFSET buffer, 0, 0
invoke ExitProcess,NULL
END START[/size]
Parameters:
lpSource = pointer to the byte array source
uSourceLen = size of byte array source
lpPattern = pointer to the byte array pattern
uPatternLen = size of byte array pattern
Return Values:
-1 = pattern not found
n = nth position of the pattern
Notes:
This is an idiot proof (I hope) generalized byte scanner search algorithm. This algorithm is for byte sizes only but you can modify it for other array size.
Have Fun!!! :grin:
Since your looking for 9 zero bytes, then there is going to be one dword that is all zero. ;) Look for a zero dword and then for your nine zero bytes. It will be four times faster than anything posted thus far.
DarkEmpire, Just use the BM Algo and modify it to handle byte size arrays since you said
...in a big array of bytes...
Also I don't assume you only want to search the 0 values don't you? ;)Actually BitRake, I don't think that'd be as fast as the algo I posted, let alone four times faster. I check every 9 bytes or so, your method would check all bytes, but 4 at a time.
--Chorus
--Chorus
chorus, sorry - I didn't look close enough at your code. :o
Posted on 2002-07-25 21:37:33 by bitRAKE
Posted on 2002-07-25 21:37:33 by bitRAKE
Thank you very much :)
Also I don't assume you only want to search the 0 values don't you?
yes, i want to search the 0 values, in fact, all others bytes can be different (not only 1), but i only search 0.
Is there a special tip if the array is a size like 100 000 bytes or 1 000 000 bytes ?
Thanks for your help
Bitrake, I checked DWORDs!
I use the DWORD check to check for zeros en-mass!
I also thought of this code:
I guess the algo you want depends of the average case you'll be dealing with.
I suppose someone will have to benchmark them, but that isn't going to be me because I am at work!
Mirno
I use the DWORD check to check for zeros en-mass!
I also thought of this code:
lea edx, array
first:
mov ecx, 9
@@:
inc edx
cmp BYTE PTR [edx - 1], 1
jne first
sbb ecx, 0
jnz @B
; Found
I guess the algo you want depends of the average case you'll be dealing with.
I suppose someone will have to benchmark them, but that isn't going to be me because I am at work!
Mirno