Often a value needs to be tested against a set of values and our high level brain takes a direct approach:
	cmp	al, 1

je Found
cmp al, 2
je Found
cmp al, 3
je Found
...
cmp al, N
je Found
There are many good things about this method: data cache is not effected, value not in set take a short execution path. No so good: lot of typing (use a macro), the penalty for values being a part of the set increases linearly with the size of the set.

Another method that is less code and quite direct:
	mov	ecx, SIZE OF TABLE - 1

_0: cmp al, [TABLE + ecx]
je Found
dec ecx
jns _0

Not_Found:

; number is NOT part of set
ret

Found:

; number is a member of set
ret
Is this method better? Only in requard to taking less typing, and not impacting the instruction cache (no need to worry about this even - not a problem). There are many additionally bad things about this method: even a longer execution path for all values, data cache is impacted.


And now we come to another method:


mov ecx, 1
mov edx, TABLE_DEPTH

_0: cmp [TABLE + ecx], al
je Found
adc ecx, ecx
dec edx
jne _0

Not_Found:

; number is NOT part of set
ret

Found:

; number is a member of set
ret


; The table has to be sorted into a balanced tree and
; first entry is not used, but makes the table a power
; of two in size.

TABLE_DEPTH EQU 5

TABLE BYTE 0
BYTE 16
BYTE 8, 24
BYTE 4, 12, 20, 28
BYTE 2, 6, 10, 14, 18, 22, 26, 32
BYTE 1,3,5,7, 9,11,13,15, 17,19,21,23, 25,27,31,33
What could be better about this method? Well, it does look more complex and it is - take some time to understand how it works in your favorite debugger. Note how both execution paths are optimized - both values in the set and not in the set take less instruction to determine than in the first method above. Also, all but the final branch is correctly predicted.

It seems like a natual question to ask: Why not do the binary tree search in code only? This has been explored here before, but it basically boils down to branch mis-prediction being a huge penalty on newer processors.
Posted on 2003-07-02 11:00:34 by bitRAKE
- needs a sorted array
- not built for signed numbers but feel free to change some jcc flags
- "divide and conquer" style
- I tested this before and it seems to have no problems. But as usual, I write buggy code, so there... :grin:
- it simply starts at the center of the starting point and ending point, in which these 2 points will vary as the code executes
[size=9].686

.MMX
.XMM
.MODEL FLAT, STDCALL
OPTION SCOPED
OPTION CASEMAP:NONE

INCLUDE C:\masm32\include\windows.inc
INCLUDE C:\masm32\include\kernel32.inc
INCLUDELIB C:\masm32\lib\kernel32.lib
INCLUDE C:\masm32\include\user32.inc
INCLUDELIB C:\masm32\lib\user32.lib

.DATA

; 0 1 2 3 4 5 6 7 8 9

myarray DD 1, 2, 4, 5, 6, 7, 9, 11, 15, 18

.CODE

start:

;let EAX be the register to hold the value we are looking for

mov eax, 3
xor ebx, ebx
mov ecx, LENGTHOF myarray - 1
mov esi, OFFSET myarray

cmp [esi+ecx*4], eax
je __found

__lp:

mov edx, ecx
shr ecx, 1
cmp [esi+ecx*4], eax
ja __elp
je __found
jmp __oilp

__elp:

test ecx, ecx
jnz __lp
jz __not_found

__oilp:

push edx
inc ebx

__ilp:

sub edx, ecx
shr edx, 1
test edx, edx
jz __not_found
add edx, ecx
cmp [esi+edx*4], eax
ja __ilp
je __found

mov ecx, edx
pop edx
dec ebx
jmp __oilp

__found:

invoke MessageBox, 0, 0, 0, 0
jmp __exit

__not_found:

; 0 1 2 3 4 5 6 7 8 9

;myarray DD 1, 2, 4, 5, 6, 7, 9, 11, 15, 18

__exit:

shl ebx, 2
add esp, ebx

invoke ExitProcess, 0
ret

END start[/size]
I don't know how this will fair on larger arrays(~1 million) compared to other methods... I'm pretty sure this is slow on smaller arrays. :)

It seems fast theoritically, but I don't know the actual performance on real world tests.
Posted on 2003-07-02 13:24:57 by arkane
arkane, seems like overkill - I'm doing the same thing with only a couple instructions and those bad branches will kill the speed. :)
Posted on 2003-07-02 13:44:44 by bitRAKE
yep, that's what I was thinking about... :grin: :grin:
Posted on 2003-07-02 13:47:46 by arkane
I'll create an example once I'm back online.
Posted on 2003-07-02 13:59:06 by bitRAKE
your favorite style: recursion :grin: :grin:
[size=9]ssrch:

pop eax
pop ecx
pop edx
push eax
__srch:
mov eax, [esp+4]
cmp eax, [esp+8]
ja __ret0
add eax, [esp+8]
shr eax, 1
cmp edx, [ecx+eax*4]
je __ret1
ja __greater
dec eax
js __ret0
push eax
push [esp+8]
call __srch
jmp __ret
__greater:
inc eax
push [esp+8]
push eax
call __srch
jmp __ret
__ret0:
xor eax, eax
retn 8
__ret1:
xor eax, eax
inc eax
__ret:
retn 8[/size]
Usage:
[size=9]    push    LENGTHOF array - 1  ;# of items in the array - 1

push 0 ;always 0
push 10 ;number to find
push OFFSET array ;pointer to the array
call ssrch

.IF(eax != 0)
invoke MessageBox, 0, 0, 0, 0
.ENDIF[/size]
Posted on 2003-07-02 14:39:45 by arkane
bitRAKE,
adc is nice idea but inc ecx is an error
You can substitude TABLE_DEPTH
with one last row with zeroes too...
and test for zero to end...just an idea
 

_0:
mov eax, edx
sub eax, [TABLE + ecx*4]
adc ecx, ecx
xor eax, edx ; exit if eax=0 or eax=edx
cmp eax, edx
jne _0
...
...
dd 1,3,5,7, 9,11,13,15, 17,19,21,23, 25,27,31,33
dd 4*4*2 Dup(0)


and rewrite the tree too!!!

Regards,
Lingo
Posted on 2003-07-02 15:23:23 by lingo12
lingo12, can you explain how that loop stops when EDX is not a value in the table? I think that two conditional jumps in the loop is the minimum. Yeah, I forgot to remove the INC - I was using another version in my code. :)

Mainly, I wanted to show the technique:
TABLE	BYTE                         0


BYTE 1

BYTE 2, 3


BYTE 4, 5, 6, 7


BYTE 8, 9, 10, 11, 12, 13, 14, 15

BYTE 16,17,18,19, 20,21,22,23, 24,25,26,27, 28,29,30,31
Posted on 2003-07-02 16:08:53 by bitRAKE
"lingo12, can you explain how that loop stops when EDX is not a value in the table?"

Yes

"I think that two conditional jumps in the loop is the minimum."

No
Posted on 2003-07-02 17:15:34 by lingo12
lingo12, :grin: there is no questioning your talent. My loop is still better, imho.

Another option would be:
_0:	cmp	al, [TABLE + ecx]

je Found
adc ecx, ecx
test ecx, SIZE OF TABLE
je _0
Posted on 2003-07-02 17:29:34 by bitRAKE
"lingo12, there is no questioning your talent."

Thanks

"My loop is still better,imho."

Yes, because it is your

"Another option would be:..."

Two conditional jumps again
Posted on 2003-07-02 17:45:02 by lingo12

Two conditional jumps again
The jump in the middle is only taken once and is basically ignored (CPU continues to execute instructions) by modern processors (ie always predicted in the case of number not in set, and taken only once for number in set). Do you find this to be a problem in your tests?
Posted on 2003-07-02 17:56:34 by bitRAKE
[size=9]srch:


mov ecx, [esp+4]
mov eax, [esp+8]
mov edx, [esp+12]

movd MM0, edx
punpcklbw MM0, MM0
punpcklbw MM0, MM0
punpcklbw MM0, MM0

;start a loop here

movq MM1, [ecx]
pcmpeqb MM1, MM0
packsswb MM1, MM1
movd eax, MM1
test eax, eax
jnz __found

;loop around here

;...
;...
;...

__not_found:

xor eax, eax

__found:

retn 12

...
...
...

push 2 ;byte size vaue to search
push LENGTHOF array - 1 ;...
push OFFSET array ;...
call srch

.IF(eax != 0)
invoke MessageBox, 0, 0, 0, 0
.ENDIF[/size]
compares 8 byte values. :)
Posted on 2003-07-02 23:05:52 by arkane
"The jump in the middle is only taken once...."

Two is more than one
Will be better to follow bitRAKE's advice:

"Conditional moves are not much better than conditional jump instructions - better to do without them all together." by bitRAKE

http://www.asmcommunity.net/board/showthread.php?threadid=9989&highlight=not+conditional

Just kidding...I like your ideas

Regards,
Lingo
Posted on 2003-07-03 00:16:27 by lingo12
I think something like this is solely dependant on the situation.

You need to take into account the likelihood that certain values will be tested against the table. Take for example a window procedure. Odds are that many, if not most, of the usual window messages you receive aren't going to be of any use to you. It seems silly then to have to scan a whole table every time you get a message you're not going to handle. I would suggest in that case, your priority is to first identify (at least some of) the values that are not in the table, so execution can quickly pass through.

Also, as is the case with the example table here, notice that all numbers are in the range 1-31. For numerically consecutive values it would seem both faster and easier to do a simple range test. Of course the tables aren't always going to look like that, but it might be worth considering how many adjacent values are in your table, as opposed to scattered values. Or really any sort of pattern you can find will work to your advantage.

But as a general purpose algorithm this would seem a lot faster than the usual way. I like the use of the tree. Good job. :alright:
Posted on 2003-07-03 00:47:33 by iblis
There are two versions of the TABLE posted above. The first is numbered in accending order of table item value - think of the number in terms of being the index into a sorted array of the table values. The second is numbers by TABLE index value - the byte number is used to better understand what the algorithm is doing.
Posted on 2003-07-03 08:33:52 by bitRAKE
oh and before I forgot, that MMX versions doesn't need a sorted array. :)
Posted on 2003-07-03 13:13:58 by arkane

"The jump in the middle is only taken once...."

Two is more than one
Will be better to follow bitRAKE's advice:

"Conditional moves are not much better than conditional jump instructions - better to do without them all together." by bitRAKE

http://www.asmcommunity.net/board/showthread.php?threadid=9989&highlight=not+conditional

Just kidding...I like your ideas

Regards,
Lingo
I guess you missed the sweet unroll potential:
	cmp	[TABLE + 1], al	; ECX always one.

rcl ecx, 1 ; doesn't effect Z flag
je Found
cmp [TABLE + ecx], al
rcl ecx, 1
je Found
cmp [TABLE + ecx], al
rcl ecx, 1
je Found
cmp [TABLE + ecx], al
rcl ecx, 1
je Found
cmp [TABLE + ecx], al
je Found
Like I said, it is better to do without. :grin:

Can someone post something that is faster on random data? I'm not concidering this for WndProc handling - it is for the general case. It works for WORDs or DWORDs just by changing the multiplier on memory indexing.

MMX isn't effective except on small sets of values [ O(n/{8,4,2}) verses O(log2 n) ].
Posted on 2003-07-08 08:43:53 by bitRAKE
bitRake,
I've read first post in the thread and
would like to extent one of methods you placed
your code contra.
I mean this method you described:
-------------------------------------------------
Another method that is less code and quite direct:




movecx, SIZE OF TABLE - 1
_0: cmp al, [TABLE + ecx]
jeFound
dec ecx
jns _0
Not_Found:
; number is NOT part of set
ret
Found:
; number is a member of set
ret


Is this method better?

-------- end of your desciption -----------

A year or more ago I posted algo logic called
"False point algo"

Scaning is used usually for to things,
1. to find a place of scanned value
2. to find if gotten value belongs to some subset. (as in your post)
Or for both.

Yet we know - if we knew for sure that
value is in table of subset values
- we could make a scaning loop faster.
'Cause we don't need to check if current address
is out of table addresses space.

For example many different asciiz zero point scanners
assumed that string has zero for sure - end this
allows to scan for it much faster then if assume
otherwize (meaning - considering possibility that there
is no zero).

Actually it can be done even if we don't for sure
if there is that value.
All we need - to include after end of the table
a place for data that serve as "false pont"
Then we change the algo this way
1. Before start of scaning loop we place
recieved value into the "false point" var at
the end of the table.
2. Then we arrange scaning loop, but now without
code that checks if there end of table, 'cause
we know now for sure - if there is no value in the
table - our scanner yet will find it - 'cause we
placed it ourself in memory next to table.
3. So the question "belongs or not recieved value to
the subset" for us is now "is address of found value
belongs to false point".
If it is - then recieved value does not belong to
the subset in table. Otherwise (address is less then
address of false point) it belongs to it.
Scanner loop and data type might be different.
I just to illustrate my point changing your code:



.data
TABLE db .,.,.,., some values subset
false point db ?
.code
mov ecx, offset of TABLE
mov false point,al
;very short loop:
_0: cmp al, [ecx]
lea ecx,[ecx][1]
jne _0
;alternative ecx,offs table-1 ;inc ecx;cmp al,[ecx];jne _0
cmp ecx,offset falsepoint+1
jne notfound
;found:
.......
;not found


It can be used for any type and scanner loop can be changed.
But it now don't need check for the end of table in loop itself
anyway.

It's not about your "contra" algo.
Just addition to you revision of possible algos.
Posted on 2003-07-08 16:04:59 by The Svin
Svin, that method is nice for small or dynamic tables, but is still O(n). Try it with a table of primes less than 256 verses the method I post above.


2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251
...54 prime numbers. My method takes less than <18 instuctions to determine if a byte is prime, and only one mis-predicted branch. It is hard to say that about any other method.
Posted on 2003-07-08 18:03:05 by bitRAKE