Hi,
my old variant of Boyer-Moore CaseSensitiveNext search algo is smaller and runs faster than other posted here on my old Pentium200 MMX. I use my own test progi based on rdtsc instruction and MASM 6.15.8803.
Now I just add startpos ?? for testing purposes only. I don't need it in my programs...
.data
ALIGN 4
l_table dd 256 Dup(0)
.code
BMCaseSNext proc startpos: DWORD, pBuffer: DWORD, lenBuffer: DWORD, pSrchData: DWORD, lenSrchData: DWORD
push edi
mov ecx, lenSrchData
push esi
lea edi, l_table
push ebx
mov esi, pBuffer
xor edx, edx
mov eax, startpos
add esi, eax
sub edx, ecx
mov ebx, pSrchData
mov eax, lenBuffer
push ebp
lea ebx,
lea ebp,
mov eax, 256
lea esi,
mov ecx, edx
Lens:
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
sub eax, 16
jne Lens
DifLen:
mov al,
inc ecx
mov , ecx
jne DifLen
Search:
mov al,
mov ecx,
add ebp, ecx
jl Not_found
sub esi, ecx
test ecx, ecx
mov ecx, edx
jne Search
push ebx
push esi
Compare:
inc ecx
jge Found
mov al,
dec ebx
mov ah,
dec esi
cmp al, ah
je Compare
pop esi
pop ebx
inc esi
xor eax,eax
dec ebp
jge Search
Not_found:
mov eax,-1
jmp Exit
Found:
pop esi
pop ebx
lea eax,
Exit:
pop ebp
pop ebx
pop esi
pop edi
ret 20
BMCaseSNext endp
buliaNaza,
Compliments on the Boyer Moore algorithm you have posted. It shows
that there is a range of difference depending on the hardware that
runs the algorithm type.
I put it as you posted it into the test bench that I used on the
other two and tested it on both the PIII 600 I use and the AMD 550
that I use as a backup box and the times are as follows.
I used a 52 meg file loaded into memory and then clocked it in real
time in milliseconds. The version I posted originally is over 20%
faster on the PIII but about 2 - 3% slower on the AMD 550.
PIII AMD 550
~~~~ ~~~~~~~
BMCaseSNext 295 ms 195 ms
BMBinSearch 230 ms 200 ms
I am sure many people would be pleased if you commented the code so it
was a lot easier to read and so they could understand it.
Regards & Compliments,
hutch@pbq.com.au
This message was edited by hutch--, on 5/24/2001 4:29:45 AMThanks hutch--,
that means that I don't need a new technology
and I will save a lot of money from
buying a new computer...hah..hah..hah..
some raw statistics:
01-02 ; progi
.....
08 10 ; cond&uncond jumps
00 06 ; not pairable
01 10 ; AGI stalls
00 01 ; using variables in loops
.....
.....
buliaNaza,
I share the humour, I enjoyed coding on my old p166 but the disk
died and I had to build a new one. I am interested in the toys
you have used to produce the statistics as I know there is one bad
stall in the version that I posted with the table access and a few
other problems with dependency chains.
When developing it, I had a range of variations that were smaller
and had shorter instruction paths but they were no faster across both
machines.
Regards,
hutch@pbq.com.au
Its been a while since I played with Vtune and I generally found it
less than useful
Thanks hutch-- for your time,
I continue with my row statistics...
Please delete this ugly&commented xor eax,eax ; zero EAX
from MainSearchLoop in your progi..You are a guru and don't need it..
buliaNaza,
You will laugh at this but by removing the xor eax, eax from the
main comparison loop it becomes about 5% faster on the AMD but
about 50% slower on the Intel. I put it in there for pairing
purposes. It also allowed me to use EAX in some of the variations
in the shift calculation.
I have a number of variations that are smaller, have far less
register dependency but are no faster, all of my testing shows
the access to the shift table is the bottleneck and none of the
variations to date have made it any faster.
Regards,
hutch@pbq.com.au
Nice work, maestros.
Maybe it would be worthwhile to post a link
to some academic site concerned with algo
design, if there is any. (I don't mean design
for any particular hardware.) It's an
interesting and important mathematical subject.
Interesting reading on string searches.
Dictionary of Algorithms, Data Structures, and Problems.
EXACT STRING MATCHING ALGORITHMS.
The core of programming is algorithm development - there are sites everywhere with examples of different algorithms. I don't know of any that discuss algorithm development in a general sense - I have learnt by example and developed my own philosophy. Maybe artificial intelligents research sites would have a more general text on the development of algorithm development. :) Or maybe I don't understand what your looking for?
Thx bitRAKE for those links, good ones.
I was curious in a general way about how much
is known, mathematically, about algo and data-
structure efficiency.
disease,
I test a running block of code in real time as it reasonably well
duplicates the condition that it will be run with in a computer.
With string data or similar byte sequence testing I build a test
file long enough to get over 2 or 3 hundred milliseconds duration
and then clock it in millisecond resolution. I prefer to use a sample
big enough to get over a half a second as it further reduces the
percentage error.
I use a couple of macros from MASM32 to do the timing and they are very
simple ones so if you run the code generator Prostart and build a
test file, they are in the macro file for the project.
The advantage of using this technique is that it runs in ring3
access without the need for special test beds or setups.
Regards,
hutch@pbq.com.au
Larry Hammick, Don Knuth has a nice set of books (The Art of Computer Programming, Volumes 1-3 Boxed Set) with mathematical analysis of some algorithms. I have another book which I've been slowly reading: How to Solve It : Modern Heuristics.
Yes, all my money goes to books, and then computer hardware. :)
This message was edited by bitRAKE, on 5/30/2001 1:32:52 PM
I know both books. Polya is (was?) one of world's
foremost mathematicians and his book on heuristic
is unique.
I have a math background and I occasionally relapse into
purely mathematical headspaces. I wondered if the
mathematicians had made progress recently on the abstract
notions of data structures and algorithms. E.g. is there
some definite measure of "compressibility" of data? How
many comparisons (suitably defined) are needed by (eg)
Boyer-Moore in the limit of large string tables? Is there
some symmetry between algos and the data structures for
which they are suitable (for if, say, the data is really
huge but of simple structure, we would do better to look
for a smarter data structure than to look for a smarter
algo). ... That sort of thing ...
Larry,
A maths background will be very useful to you in algorithm design,
a large amount of the available data is in mathematical notation
and it assumes that the reader has it as well.
I come from a logic background a long time ago and I approach an
algorithm from a different perspective which sometimes works for
me and sometimes does not. Among the things to be careful about
is the translation of algorithm theory to a working implementation.
Assembler regularly breaks the rules because it does not directly fit
the mathematical model for some algorithm design theory. A good
case in point is a traditional byte scanner and a Boyer Moore algo
for finding a pattern in a larger text, the BM algo has better logic
which makes it faster in some circumstances. The problem with exact
pattern matching theory is the assumption of constants such as the
time it takes to compare 2 characters.
The model works in part if everyone uses portable ANSI C but when you
start coding the variations in assembler, the rules change. A forward
byte scanner is by no means slow when written correctly and it always
compares more characters than are in the source text.
A "sublinear" algorithm can be slower by a lot if the implementation
has high overhead. General drift is always treat design theory with
a generous amount of practical implementation to get a good result.
Regards,
hutch@pbq.com.au
Hi Hutch,
I hate to have in my ASSEMBLY (not C/C++/Pascal, etc) code
additional rubbish as follow:
- push argument1, push argument2, call proc
and next pop argument2, pop argument1 etc.
I prefer to send the arguments by registers.
- local variables
- save esp in other register and can't use it
- .DO, .WHILE, .IF, invoke, and other HL stuff because I can't control my code in a low level
- preserve Win dependent registers because I use this one and work with MainMessages only!
.data
ALIGN 4
MainMessages DD WM_COMMAND,OnCommand
DD WM_CREATE,OnCreate
DD WM_DESTROY,OnDestroy
DD WM_PAINT, OnPaint
....
....
nMessages DD $-MainMessages
.code
WndProc PROC
mov eax, nMessages
mov ecx, offset MainMessages
shr eax, 3 ;get number of messages
mov edx, ;get uMsg in edx
WndProc_1:
dec eax
jl DefWindowProc ;message not found
cmp ,edx ;is it the correct message?
jnz WndProc_1 ;no
push ESP ;save registers as required by Windows
push EBP
push EBX
push EDI
push ESI
call dword ptr
;call correct procedure for the message
;now =hwnd,=umsg,
;=wparam,=lparam
pop ESI
pop EDI
pop EBX
pop EBP
pop ESP
jc DefWindowProc ;nc=don't call DefWindowProcA eax=exit code
pop edx
pop ecx
pop ebx
pop ecx
pop ebx
push edx
ret ;ret 16
WndProc ENDP ;
So, here is my original BM algo, a little bit complicated but may be faster in your computer...
.386
.model flat, stdcall
BMCaseSNext proto
.data
ALIGN 4
l_table DD 256 Dup(?) ; working table
.code
;Usage:
;esi ->pBuffer ; esi->buffer with bytes to be searched through
;ebp = lenBuffer ; ebp = length of the buffer
;ebx ->pSrchData ; ebx->pointer to data to be searched for
;ecx = lenSrchData ; ecx= length of data to be searched for
;edi ->pl_table ; edi->pointer to working table (must be aligned)
;call BMCaseSNext
BMCaseSNext proc
add esi, ebp ;esi->pointer to the last byte of pBuffer
lea ebx, ;ebx-> pointer to the last byte of pSrchData
neg ecx ;ecx= - lenSrchData
mov edx, ecx ;ecx = edx = - lenSrchData
add ebp, ecx ;sub lenSrchData from lenBuffer
mov eax, 256 ;eax = counter
xor ebp, -1 ;not ebp-> current negative index
Lens:
mov , edx ; filling up 256 DD l_table with - lenSrchData
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
mov , edx
sub eax, 16
jne Lens
DifLen:
mov al, ; filling up with the real negative offset of every
inc ecx ; byte from the pSrchData, starting from the last to
mov , ecx ; the first, at the offset in l_table equal to the
jne DifLen ; ASCII code of the byte, multiplied by 4
Search: ; the main searching loop
mov al, ; get a byte from pBuffer ->esi
Search1:
sub ebp, ; sub negative
buliaNaza,
Thanks for making this code available and particularly for commenting
the code so everyone knows how it works. Complex algorithms like
this are appreciated by many programmers who are interested in writing
advanced code.
I am a bit busy at the moment and I don't have time to set it up
to benchmark the code but I am sure that there are a number of people
who are members of the forum who would be interested in helping to
test the code on different machines.
I like the idea of putting the shift table within the actual code
as it makes the algorithm self contained so that it can be used as an
object.
Regards,
hutch@pbq.com.au