Let's consider algorithms of optimization of the following task.
There is a big byte array (4 Mbytes) arbitrary sequence of the data from 0 up to F (stream)
There is a set from a data series (for example 65536) with values from 0 up to F (patterns) a size up to 32 octad.
It is necessary to find offsets of all patterns and their numbers in stream.
The set patterns varies very seldom. Therefore it is possible to execute anyone of prediction for acceleration of retrieval.

s=0 4 7 D 8 F E B C 2 6 7 4 F E C
p0=F E B
p1=F E
p3=8 F E C
p2=D 8 F E B C

Result:
p0 Offset 5
p1 Offset 5
p2 Offset 3
Posted on 2002-03-31 12:52:00 by Nexo
Not sure what the question is here but *IF* you're asking for an algorithmic method for doing this then you might look into "suffix trees" or the closely related "suffix arrays". They're very useful & quick (linear time) stuctures but tend to get overlooked in classes.

Let's see the reference I like for this one is Gusfield's "Algorithms on string, trees, & sequences" but I'll bet that's a hard find in Russia. Sorry. Other related names are Ukkonen & Weiner. I can spill more but I've only done them in HLL for work a while back so it'll be accessing long dead brain tissue but I can get an electrode to spark the memory if needed.

Am I getting warmer or colder?
Posted on 2002-03-31 20:00:32 by Mutant Slime
I'd try a finite state machine implementation that allows all patterns to be found concurently. It would read in bytes 0 thru F that would indicate the next node in the FSM. If the FSM node is flagged for output then a solution is output. No comparisions need to be made at time of search, so I think it would be fast.
Posted on 2002-03-31 20:38:31 by bitRAKE
If there is only one occurrence of each pattern:
A binary search with a binary tree?

Or just a binary tree?

mov eax, sizeArray
mov ebx, array
shr eax, 1
add ebx, eax
mov ecx,
cmp ecx, MID_1
jb @F
...
@@: cmp ecx, MID_2
jb @F
...
Posted on 2002-03-31 23:14:08 by bdjames
Here's an FSM sample this was snipped from the KMP algo(No gaurantee this is a perfect translation).

<snip>
[size=9]

esp+04h == #of elements of the array source
eax == is initialized to -1
ebx == is a pointer to your array of DWORDS wherein
the number of elements is calculated as
#of elements of the array source + 1
== First element of ebx should be initialize to -1
ecx == is initialized to 0
esi == a pointer to the array source
edi == a pointer to the array pattern

@@GENERATE_TABLE:

cmp ecx, DWORD PTR [esp+04h]
je @@TABLE_FINISHED

@@INNER_TABLE_LOOP:

testeax, eax
js @@CONTINUE_TABLE
mov edx, DWORD PTR [edi+ecx*04h]
cmp edx, DWORD PTR [edi+eax*04h]
je @@CONTINUE_TABLE
mov eax, [ebx][eax*04h]
jmp @@INNER_TABLE_LOOP

@@CONTINUE_TABLE:

inc ecx
inc eax
mov edx, DWORD PTR [edi+ecx*04h]
cmp edx, DWORD PTR [edi+eax*04h]
jne @@SHIFTTABLE_VALUE_IS_EAX
mov edx, [ebx][eax*04h]
mov [ebx][ecx*04h], edx
jmp @@GENERATE_TABLE

@@SHIFTTABLE_VALUE_IS_EAX:

mov [ebx][ecx*04h], eax
jmp @@GENERATE_TABLE[/size]
<snip>
Posted on 2002-04-01 01:18:33 by stryker
BitRAKE, you have offered the best direction of the solution.Other name - a digital tree.
But how you see structures which present node?
Posted on 2002-04-01 09:03:31 by Nexo

BitRAKE, you have offered the best direction of the solution.Other name - a digital tree. But how you see structures which present node?
Well, lets see:

1. How many patterns do we search for at once?
2. Allow for patterns of patterns: FFF, F3F3, etc...
3. Allow for multiple patterns ending: FF3 & F3
4. Allow patterns of length 32

The answer to the first will drive the representation of the structures. An array of 32 x 16 items should be enough, but the answer to the question above will define what item is.
Posted on 2002-04-01 11:05:12 by bitRAKE
I shall try in other words:)
Set patterns up to 65536.
Length patterns from 1 up to 32 bytes.
One byte this value from 0 up to F.
The set patterns can be various length and contents.
In result it is necessary determine what patterns approach for current position in stream.

Why for you items in the two-dimensional array? How can work?
The digital tree is represented so:


Posted on 2002-04-01 12:48:19 by Nexo
Nexo, that clears a lot,
Am I right:
first (great) tetrada of any byte is always 0 ?
Posted on 2002-04-01 13:24:17 by The Svin

Why for you items in the two-dimensional array? How can work?
It wouldn't. :)

How would your example structure deal with the input stream 2210? That tree could get quite large - that is too much data, imho. The leaf space will be almost empty. My array method is very dense, but I haven't worked out the details. I will post some code by the weekend. It is a true FSM, not a tree. The array is just to simplify data access - it is not needed.

Posted on 2002-04-02 01:04:54 by bitRAKE
2 The Svin> Yes.

2 BitRAKE>
Stream=2210
The current node - Root.
Get 2. Check first (Root) node. This hit. Go to second node.
Get 2. Check second node. It is a miss.
That have not found: 10,21,222.

I can use several methods for packing leafs of node. I done not troubled with quantity of memory. Main rate of execution of search.
The method of digital tree musters presence of all patterns in one pass. It is not dependens of quantity on patterns !!!

I yet absolutely fathom your method for search of all patterns with obtaining their numbers. On how many it will be fast?
Posted on 2002-04-02 09:02:52 by Nexo

I yet absolutely fathom your method for search of all patterns with obtaining their numbers. On how many it will be fast?
I have yet to fathom it myself. :)

1. Get 2. --> Start node two
2. Get 2. --> Second two node
3. Get 1. --> Output found pattern 21, Start node one
4. Get 0. --> Output found pattern 10, Node zero

Why doesn't your method find the two patterns? Maybe, it will be the same thing that we are talking about - just looking different? I doubt your method could find all patterns concurrently with having the FSM structure within. I was wrong about the array though - it will not work.
Posted on 2002-04-02 10:33:44 by bitRAKE
It can is necessary to specify, that all patterns will find on a digital tree for the CURRENT position in stream.
Some speculations. It is necessary for me to detour a tree for each position in stream. It is possible to make a rating of productivity of one step in stream (calls to cells). The maximum delay is defined by a maximum size pattern, but does not depend on their quantity. Quantity patterns on much greater lengths.

I start to fathom your approach to solution ;)
Similarity to a digital tree was initially remarked, but here to me not clear a relation of a position in stream to status FSM. FSM should work for any stream. How the same FSM for stream 01022210 will work?
Posted on 2002-04-02 11:30:12 by Nexo
0 No zero start node, no nothing
1 Start node one
0 Output found 10
2 Start node two
2 Second two node
2 Output found node 222, Third two node
1 Output found node 21
0 Output found node 10
Posted on 2002-04-02 11:42:15 by bitRAKE
Now I see, that FSM and search on a digital tree completely identical.
If to expand structure of a tree in case of one leaf from node, it is possible to proceed on direct matching of a tail pattern and stream. It is similar a tail pattern in FSM. For example, (4) - > (2) - > (1) - a tail pattern ..421. It will be possible it is a faster way?
Posted on 2002-04-02 12:37:18 by Nexo
Here is another example picture. I am so visual. Only can follow the arrows from the active node or parent (green). Only can start on parent node. Output on red arrow. If input does not match path from current node or parent then start over.
Posted on 2002-04-03 08:31:43 by bitRAKE
You have not overlooked that it is necessary to gain number and a position of the pattern?
If number is defined by a state, as about two output red arrows in one state?

In one cell of the description of a leaf I assume to place:
NextNode:32
Number:16; flMiddle:1; flLast:1; flNumber:1

If flNumber=1, number (Number) and a position pattern in the table of results is added.
If flLast=1, reset of tree traversal, change current node on RootNode.
If flMiddle=1, change current node on NextNode.
Posted on 2002-04-03 09:45:44 by Nexo
Originally posted by Nexo
You have not overlooked that it is necessary to gain number and a position of the pattern?
This becomes a separate problem. Simple look-up table?
What if patterns: 21, 2121? :(
Originally posted by Nexo
If number is defined by a state, as about two output red arrows in one state?
Yeah, I have overlooked this. If this is a solution, then it would only be faster in some situations? I'm still hammering away at it... :(

Still fun to work on! :)
These are very important algorithms (VIA?) :)
Posted on 2002-04-03 10:30:10 by bitRAKE
I see while one solution on exclusion of a problem of inclusions of patterns each other.
It is necessary to fix only those patterns which start in a current position.

Draft code:

; esi-stream pointer
; ebx-stream size

nextPosition:
mov ebp, ; set current node - RootNode
xor ecx,ecx
nextByte:
movzx eax, ; get stream byte
mov edx,
test edx,MASK flNumber ; is last byte of pattern?
je skipNumber
mov eax,edx
and eax,MASK Number
; eax-number of patterns
; esi-position in stream
call AddResult
skipNumber:
test edx,MASK flLast ; no more NextNode
jne exitOnLast
mov ebp, ; change current node on NextNode
inc ecx
jmp nextByte

exitOnLast:
inc esi
dec ebx
jne nextPosition
ret
Posted on 2002-04-03 11:21:21 by Nexo