Let's consider algorithms of optimization of the following task.

There is a big byte array (4 Mbytes) arbitrary sequence of the data from 0 up to F (stream)

There is a set from a data series (for example 65536) with values from 0 up to F (patterns) a size up to 32 octad.

It is necessary to find offsets of all patterns and their numbers in stream.

The set patterns varies very seldom. Therefore it is possible to execute anyone of prediction for acceleration of retrieval.

s=0 4 7 D 8 F E B C 2 6 7 4 F E C

p0=F E B

p1=F E

p3=8 F E C

p2=D 8 F E B C

Result:

p0 Offset 5

p1 Offset 5

p2 Offset 3

There is a big byte array (4 Mbytes) arbitrary sequence of the data from 0 up to F (stream)

There is a set from a data series (for example 65536) with values from 0 up to F (patterns) a size up to 32 octad.

It is necessary to find offsets of all patterns and their numbers in stream.

The set patterns varies very seldom. Therefore it is possible to execute anyone of prediction for acceleration of retrieval.

s=0 4 7 D 8 F E B C 2 6 7 4 F E C

p0=F E B

p1=F E

p3=8 F E C

p2=D 8 F E B C

Result:

p0 Offset 5

p1 Offset 5

p2 Offset 3

Not sure what the question is here but *IF* you're asking for an algorithmic method for doing this then you might look into "suffix trees" or the closely related "suffix arrays". They're very useful & quick (linear time) stuctures but tend to get overlooked in classes.

Let's see the reference I like for this one is Gusfield's "Algorithms on string, trees, & sequences" but I'll bet that's a hard find in Russia. Sorry. Other related names are Ukkonen & Weiner. I can spill more but I've only done them in HLL for work a while back so it'll be accessing long dead brain tissue but I can get an electrode to spark the memory if needed.

Am I getting warmer or colder?

Let's see the reference I like for this one is Gusfield's "Algorithms on string, trees, & sequences" but I'll bet that's a hard find in Russia. Sorry. Other related names are Ukkonen & Weiner. I can spill more but I've only done them in HLL for work a while back so it'll be accessing long dead brain tissue but I can get an electrode to spark the memory if needed.

Am I getting warmer or colder?

I'd try a finite state machine implementation that allows all patterns to be found concurently. It would read in bytes 0 thru F that would indicate the next node in the FSM. If the FSM node is flagged for output then a solution is output. No comparisions need to be made at time of search, so I think it would be fast.

If there is only one occurrence of each pattern:

A binary search with a binary tree?

Or just a binary tree?

mov eax, sizeArray

mov ebx, array

shr eax, 1

add ebx, eax

mov ecx,

cmp ecx, MID_1

jb @F

...

@@: cmp ecx, MID_2

jb @F

...

A binary search with a binary tree?

Or just a binary tree?

mov eax, sizeArray

mov ebx, array

shr eax, 1

add ebx, eax

mov ecx,

cmp ecx, MID_1

jb @F

...

@@: cmp ecx, MID_2

jb @F

...

Here's an FSM sample this was snipped from the KMP algo(No gaurantee this is a perfect translation).

<snip>

<snip>

```
[size=9]
```

esp+04h == #of elements of the array source

eax == is initialized to -1

ebx == is a pointer to your array of DWORDS wherein

the number of elements is calculated as

#of elements of the array source + 1

== First element of ebx should be initialize to -1

ecx == is initialized to 0

esi == a pointer to the array source

edi == a pointer to the array pattern

@@GENERATE_TABLE:

cmp ecx, DWORD PTR [esp+04h]

je @@TABLE_FINISHED

@@INNER_TABLE_LOOP:

testeax, eax

js @@CONTINUE_TABLE

mov edx, DWORD PTR [edi+ecx*04h]

cmp edx, DWORD PTR [edi+eax*04h]

je @@CONTINUE_TABLE

mov eax, [ebx][eax*04h]

jmp @@INNER_TABLE_LOOP

@@CONTINUE_TABLE:

inc ecx

inc eax

mov edx, DWORD PTR [edi+ecx*04h]

cmp edx, DWORD PTR [edi+eax*04h]

jne @@SHIFTTABLE_VALUE_IS_EAX

mov edx, [ebx][eax*04h]

mov [ebx][ecx*04h], edx

jmp @@GENERATE_TABLE

@@SHIFTTABLE_VALUE_IS_EAX:

mov [ebx][ecx*04h], eax

jmp @@GENERATE_TABLE[/size]

<snip>BitRAKE, you have offered the best direction of the solution.Other name - a digital tree.

But how you see structures which present node?

But how you see structures which present node?

BitRAKE, you have offered the best direction of the solution.Other name - a digital tree. But how you see structures which present node?

1. How many patterns do we search for at once?

2. Allow for patterns of patterns: FFF, F3F3, etc...

3. Allow for multiple patterns ending: FF3 & F3

4. Allow patterns of length 32

The answer to the first will drive the representation of the structures. An array of 32 x 16

*items*should be enough, but the answer to the question above will define what

*item*is.

I shall try in other words:)

Set patterns up to 65536.

Length patterns from 1 up to 32 bytes.

One byte this value from 0 up to F.

The set patterns can be various length and contents.

In result it is necessary determine what patterns approach for current position in stream.

Why for you items in the two-dimensional array? How can work?

The digital tree is represented so:

Set patterns up to 65536.

Length patterns from 1 up to 32 bytes.

One byte this value from 0 up to F.

The set patterns can be various length and contents.

In result it is necessary determine what patterns approach for current position in stream.

Why for you items in the two-dimensional array? How can work?

The digital tree is represented so:

Nexo, that clears a lot,

Am I right:

first (great) tetrada of any byte is always 0 ?

Am I right:

first (great) tetrada of any byte is always 0 ?

Why for you items in the two-dimensional array? How can work?

How would your example structure deal with the input stream 2210? That tree could get quite large - that is too much data, imho. The leaf space will be almost empty. My array method is very dense, but I haven't worked out the details. I will post some code by the weekend. It is a true FSM, not a tree. The array is just to simplify data access - it is not needed.

Posted on 2002-04-02 01:04:54 by bitRAKE

2 The Svin> Yes.

2 BitRAKE>

Stream=2210

The current node - Root.

Get 2. Check first (Root) node. This hit. Go to second node.

Get 2. Check second node. It is a miss.

That have not found: 10,21,222.

I can use several methods for packing leafs of node. I done not troubled with quantity of memory. Main rate of execution of search.

The method of digital tree musters presence of all patterns in one pass. It is not dependens of quantity on patterns !!!

I yet absolutely fathom your method for search of all patterns with obtaining their numbers. On how many it will be fast?

2 BitRAKE>

Stream=2210

The current node - Root.

Get 2. Check first (Root) node. This hit. Go to second node.

Get 2. Check second node. It is a miss.

That have not found: 10,21,222.

I can use several methods for packing leafs of node. I done not troubled with quantity of memory. Main rate of execution of search.

The method of digital tree musters presence of all patterns in one pass. It is not dependens of quantity on patterns !!!

I yet absolutely fathom your method for search of all patterns with obtaining their numbers. On how many it will be fast?

I yet absolutely fathom your method for search of all patterns with obtaining their numbers. On how many it will be fast?

1. Get 2. --> Start node two

2. Get 2. --> Second two node

3. Get 1. --> Output found pattern 21, Start node one

4. Get 0. --> Output found pattern 10, Node zero

Why doesn't your method find the two patterns? Maybe, it will be the same thing that we are talking about - just looking different? I doubt your method could find all patterns concurrently with having the FSM structure within. I was wrong about the array though - it will not work.

It can is necessary to specify, that all patterns will find on a digital tree for the CURRENT position in stream.

Some speculations. It is necessary for me to detour a tree for each position in stream. It is possible to make a rating of productivity of one step in stream (calls to cells). The maximum delay is defined by a maximum size pattern, but does not depend on their quantity. Quantity patterns on much greater lengths.

I start to fathom your approach to solution ;)

Similarity to a digital tree was initially remarked, but here to me not clear a relation of a position in stream to status FSM. FSM should work for any stream. How the same FSM for stream 01022210 will work?

Some speculations. It is necessary for me to detour a tree for each position in stream. It is possible to make a rating of productivity of one step in stream (calls to cells). The maximum delay is defined by a maximum size pattern, but does not depend on their quantity. Quantity patterns on much greater lengths.

I start to fathom your approach to solution ;)

Similarity to a digital tree was initially remarked, but here to me not clear a relation of a position in stream to status FSM. FSM should work for any stream. How the same FSM for stream 01022210 will work?

0 No zero start node, no nothing

1 Start node one

0 Output found 10

2 Start node two

2 Second two node

2 Output found node 222, Third two node

1 Output found node 21

0 Output found node 10

1 Start node one

0 Output found 10

2 Start node two

2 Second two node

2 Output found node 222, Third two node

1 Output found node 21

0 Output found node 10

Now I see, that FSM and search on a digital tree completely identical.

If to expand structure of a tree in case of one leaf from node, it is possible to proceed on direct matching of a tail pattern and stream. It is similar a tail pattern in FSM. For example, (4) - > (2) - > (1) - a tail pattern ..421. It will be possible it is a faster way?

If to expand structure of a tree in case of one leaf from node, it is possible to proceed on direct matching of a tail pattern and stream. It is similar a tail pattern in FSM. For example, (4) - > (2) - > (1) - a tail pattern ..421. It will be possible it is a faster way?

Here is another example picture. I am so visual. Only can follow the arrows from the active node or parent (green). Only can start on parent node. Output on red arrow. If input does not match path from current node or parent then start over.

You have not overlooked that it is necessary to gain number and a position of the pattern?

If number is defined by a state, as about two output red arrows in one state?

In one cell of the description of a leaf I assume to place:

NextNode:32

Number:16; flMiddle:1; flLast:1; flNumber:1

If flNumber=1, number (Number) and a position pattern in the table of results is added.

If flLast=1, reset of tree traversal, change current node on RootNode.

If flMiddle=1, change current node on NextNode.

If number is defined by a state, as about two output red arrows in one state?

In one cell of the description of a leaf I assume to place:

NextNode:32

Number:16; flMiddle:1; flLast:1; flNumber:1

If flNumber=1, number (Number) and a position pattern in the table of results is added.

If flLast=1, reset of tree traversal, change current node on RootNode.

If flMiddle=1, change current node on NextNode.

*Originally posted by Nexo*

**You have not overlooked that it is necessary to gain number and a position of the pattern?**

**This becomes a separate problem. Simple look-up table?**

What if patterns: 21, 2121? :(

What if patterns: 21, 2121? :(

*Originally posted by Nexo*

**If number is defined by a state, as about two output red arrows in one state?**

**Yeah, I have overlooked this. If this is a solution, then it would only be faster in**

Still fun to work on! :)

These are very important algorithms (VIA?) :)*some*situations? I'm still hammering away at it... :(Still fun to work on! :)

These are very important algorithms (VIA?) :)

**
**I see while one solution on exclusion of a problem of inclusions of patterns each other.

It is necessary to fix only those patterns which start in a current position.

Draft code:

; esi-stream pointer

; ebx-stream size

nextPosition:

mov ebp, ; set current node - RootNode

xor ecx,ecx

nextByte:

movzx eax,

It is necessary to fix only those patterns which start in a current position.

Draft code:

; esi-stream pointer

; ebx-stream size

nextPosition:

mov ebp, ; set current node - RootNode

xor ecx,ecx

nextByte:

movzx eax,

**; get stream byte**

mov edx,

test edx,MASK flNumber ; is last byte of pattern?

je skipNumber

mov eax,edx

and eax,MASK Number

; eax-number of patterns

; esi-position in stream

call AddResult

skipNumber:

test edx,MASK flLast ; no more NextNode

jne exitOnLast

mov ebp, ; change current node on NextNode

inc ecx

jmp nextByte

exitOnLast:

inc esi

dec ebx

jne nextPosition

ret

mov edx,

test edx,MASK flNumber ; is last byte of pattern?

je skipNumber

mov eax,edx

and eax,MASK Number

; eax-number of patterns

; esi-position in stream

call AddResult

skipNumber:

test edx,MASK flLast ; no more NextNode

jne exitOnLast

mov ebp, ; change current node on NextNode

inc ecx

jmp nextByte

exitOnLast:

inc esi

dec ebx

jne nextPosition

ret

** **