SearchBuffer DB 4096 DUP(?)


invoke SetFilePointer, hFile, 17000h, 0, FILE_BEGIN
invoke ReadFile, hFile, addr SearchBuffer, 4095, addr var1, NULL

mov ecx, 512 ; how many bytes to inspect: 2048 / 4
lea edi, OFFSET SearchBuffer ; start address of region to search, load source in edi
mov eax, 50746547h ; test if next dword is 'GetP'

repne scasd
jnz finish

; Detected !
invoke MessageBox, 0, offset szDetected2, offset szMsgCap, MB_OK or MB_ICONINFORMATION

; Not Detected!

Hm... it does not find "GetP" even if it does exist in the file after Seek 17000h and in the next 2000 dec. bytes....
Any ideas ?
Posted on 2004-04-25 07:04:24 by [-xor-]
Afternoon, [-xor-].

Using repne scasd means that edi is incremented by 4 bytes each iteration. You've got to scan each byte for the sequence.

Posted on 2004-04-25 08:20:14 by Scronty
scasb instand of scasd ? But i have then to make a lot of such tests if the word i would like to scan for is long...
Posted on 2004-04-25 08:27:36 by [-xor-]
I have a string ( SearchBuffer ) with Data from a a file.

Now i want to search for "ABCD1234" String in this SearchBuffer - i need the value from that offset were the string is detected in that buffer.

Thx for helping

Note: This string can contain ZEROS before the Text to find comes...
Posted on 2004-04-25 08:37:42 by [-xor-]

stringtosearch db "ABCD1234"
mov edi, offset stringtosearch
mov esi, offset datatosearch
movq mm0, [edi]
movq mm1, [esi]
pcmpeqb mm0, mm1
packsswb mm1, mm1
mov ecx, mm1
test ecx, ecx
jnz theymatch
inc esi
jmp @B
sub esi, offset datatosearch

Forgive me, not tested and never handle all possible cases.
Posted on 2004-04-25 08:53:52 by roticv
"6. Alignment
All data in RAM should be aligned to addresses divisible by 2, 4, 8, or 16 according to this scheme:" by A.Fog

it will be slow (penalties)

movq mm1, -> esi should be aligned by 8
but we have:
inc esi -> so, esi isn't aligned by 8

Posted on 2004-04-25 09:19:05 by lingo12
no mmx please :(
Posted on 2004-04-25 09:29:35 by [-xor-]
The mmx code may be nice as long as the length of the search string is exactly 8 bytes. Otherwise, it would need modifications if the length is shorter and a major rework if the length is longer.

Scaning for dwords can be done but may generally be slower than scanning for bytes except under few conditions. This would be a suggestion to search for a "GetProc" (7 bytes) string.

searchstring db "GetProc"

mov ecx, 512
lea edi,SearchBuffer
lea esi,searchstring
mov eax, [esi]
mov edx,[esi+3] ;last 4 search characters

repne scasd
jnz @F
cmp edx,[edi-1]
jz detected
jmp loop1

lea edi,SearchBuffer+1
mov ecx, 512
repne scasd
jnz @F
cmp edx,[edi-1]
jz detected
jmp loop2

lea edi,SearchBuffer+2
mov ecx, 512
repne scasd
jnz @F
cmp edx,[edi-1]
jz detected
jmp loop3

lea edi,SearchBuffer+3
mov ecx, 512
repne scasd
jnz finish
cmp edx,[edi-1]
jz detected
jmp loop4

;add edi,3 ;to point to the byte following the 7-byte search string
invoke MessageBox, 0, offset szDetected2, offset szMsgCap, MB_OK or MB_ICONINFORMATION

; Not Detected!

The above would need to be modified based on the length of the search string. If it is very long, the value of ecx may also need to be corrected to prevent GPFs.

Posted on 2004-04-25 10:11:16 by Raymond
I did play around and this seems to work:

xor eax,eax
mov c1, eax
mov c2, eax
mov c3, 14 ;sizeof "GetProcAddress"


inc c1
cmp c1, 2030 ; search max
jz finish2 ; quit search

mov ecx, c1 ; Counter #1 --- Buffer
mov bx, word ptr SearchBuffer[ecx] ; Get the byte
mov ecx, c2 ; Counter #2 --- String
mov ax, word ptr szGPAString[ecx] ; Get the byte
cmp ax, bx
jz same ;; if not .. jump
mov c2, 0h ; Reset Counter of Search String
jmp search_loop

; this byte is detected - lets look for the next byte
dec c3
cmp c3, 0 ; last byte ?
jz string_detected
inc c2
jmp search_loop

;invoke MessageBox, 0, offset szDetected2, offset szMsgCap, MB_OK or MB_ICONINFORMATION
Posted on 2004-04-25 18:09:01 by [-xor-]
Maybe the Boyer-Moore algorithm..
Posted on 2004-04-26 17:42:13 by stormix
BM for such a short search string? It's a good algorithm but not the holy grail - BM does best with large search strings and large amounts of data to scan through.
Posted on 2004-04-27 03:15:53 by f0dder
oh ok, how large would they need to be before it's worth using bm?
Posted on 2004-04-27 04:07:37 by stormix
oh ok, how large would they need to be before it's worth using bm?

That depends a lot on the data being searched and the search pattern :)


A naive byte-scanner would perform (15*16)/2 = 120 comparisons for each AAAAAAAAAAAAAAB block, which for the above means 360 operations.

While BM has an overhead of computing a table with 256 entries and a table the length of the search string, it only performs 3 comparisons on the example above (bad character shift in for each case), for a total of 256+15+3 = 274 operations.

This is of course a totally fabricated example, and the inner loop of BM is more complicated than that of a byte scanner, so operations do not translate to clock cycles, but it does show that even for fairly short search strings and data BM can potentially be better.

Posted on 2004-04-27 07:21:22 by Jibz

You can use my InString algo too:

Posted on 2004-04-27 16:02:12 by lingo12