I have an array of structures that are freed and allocated many times. I'm using a bit lookup table to quickly find the first free structure (faster than a free list and uses less memory than pointer table). Each array also has a free count - so I can quickly skip the array if there are no free structures. My question is one of optimization - I'd like to find the fastest and smallest way to use the bit lookup table to retreive the structure index in the array.
; EDX is the start of the bit array

xor eax, eax
lea ecx, [edx - 4]
@@:
; NOTE: exit is forced, bit is always set
or eax, [edx]
lea edx, [edx + 4]
je @B

lea ebx, [edx - 4]

sub edx, ecx ; byte offset
bsf ecx, eax ; first bit set
shl edx, 3 ; bit offset
btr eax, ecx ; clear bit = structure is used
or edx, ecx
mov [ebx], eax
; EDX in the index number into the structure array
Before this code executes a test has already been made to insure one free structure exists (one bit in the array is set). Hope this explains what I'm after. I just thought it would be a good problem for people to grind their teeth on. :)

Thank you for the help!

p.s. I am aware that the bit index <> index into structure array due to the lower three bits being reversed. The important thing is just to have a one to one corespondents between the bit flag and the availiblity of a structure.

:) Smallest code might be:
	xor	eax, eax

@@: bt [edx], eax
inc eax
jnc @B
dec eax
btr [edx], eax ; structure array index
But what about the smallest code checking 32 bits at once?
Posted on 2003-03-26 22:58:17 by bitRAKE
I don't mind to grind my teeth on this. Maybe I get it wrong, but is all you need is to find first bit setted? Then maybe rotate it
	mov eax, [edx]

@@: rcr eax, 1
inc ecx
jnc @B
; ecx here contain 1-based position


If I do get wrong, please, could you explain it in detail. How long will be the bit field, or it's length may vary
Posted on 2003-03-27 02:27:42 by masquer
The problem can be thought of in three pieces:

1. Find first set bit in array of bytes.
2. Clear set bit found in 1.
3. Get index value of bit

masquer, your code is a solution to part one only if the array is four bytes long. It is not that small. Let me compose a better example with some fake data:

BitArray DWORD \

00000000000000000000000000000000y, \
00000000000000000000001000000000y, \
00001000010001000000010010100000y, \
00001000001000100000001000000000y

StrucArray MySTRUCT \
(8 * SIZEOF BitArray) dup <>
Each bit in the BitArray represents a structure in the StrucArray -- if the bit is set then the structure is not in use. The following code completes all three goals:
	lea	edx, BitArray

xor eax, eax
@@: ; find set bit in array
bt [edx], eax
; INC doesn't effect carry flag
inc eax
jnc @B
dec eax

; clear bit because structure now in use
btr [edx], eax

; offset into structure array
imul eax, SIZEOF MySTRUCT
; can now use [eax + StrucArray] to access structure
Hope the example makes the problem clearer. The code should find a free structure, mark it as unavailible, and provide the address to the structure for use. There will also be a complementary algorithm to free a structure by setting the bit.
Posted on 2003-03-27 10:34:02 by bitRAKE
Have you thought about this : every time you free a structure, you move the last allocated structure in place of the freed structure. This way there is no hole in your array and the index of the next free structure is equal to the number of allocated structures.
Posted on 2003-03-27 10:58:48 by Dr. Manhattan
Dr. Manhattan, I can't move the structures around as there are pointers to them in other structures, but that is a good idea! Wouldn't the allocation have to equal the number of freed items?
Posted on 2003-03-27 11:07:28 by bitRAKE
bitRAKE, what about a divide and conquer approach using some kind of binary tree lookup for finding free nodes?
For example, say you got 8 dwords with each bit identifying the availability of a structure (8x32=256 bits). Finding a free spot would mean searching on average 8/2=4 dwords and then scanning one of them for the free bit index.
You could add a tree with a number of leaves equal to the number of dwords (depth log(n)+1):


a
/ \
/ \
/ \
/ \
/ \
b c
/ \ / \
/ \ / \
d e f g
/ \ / \ / \ / \
h i j k l m n o
[size=9]ph33r my ascii art sk1llz :)[/SIZE]

Each node in the tree holds a value, either true or false (could be stored in one bit). They tell whether a 0 bit in a specific part of the dword array is present:


a = 1 if a 0 bit is present in any of the 8 dwords (0..7)
b = 1 if a 0 bit is present in the first half of the dwords (0..3)
c = 1 if a 0 bit is present in the second half of the dwords (4..7)
d = 1 if 0 bit .. in (0..1)
e = 1 if 0 bit .. in (2..3)
f = 1 if 0 bit .. in (4..5)
g = 1 if 0 bit .. in (6..7)
h = 1 if 0 bit .. in (0)
i = 1 if 0 bit .. in (1)
j = 1 if 0 bit .. in (2)
k = 1 if 0 bit .. in (3)
l = 1 if 0 bit .. in (4)
m = 1 if 0 bit .. in (5)
n = 1 if 0 bit .. in (6)
o = 1 if 0 bit .. in (7)


The nodes could be stored in a heap-like manner:


0 1
index: 012345678901234
value: abcdefghijklmno

child1_index = parent_index * 2 + 1
child2_index = parent_index * 2 + 2

Now you can find the first dword with a 0-bit in 2*log(number_of_dwords) steps (both childs need to be checked, although on average only one). Naturally it's more efficient with more dwords.

Thomas
Posted on 2003-03-27 12:45:52 by Thomas
Sweet idea Thomas! :alright: Wonder if it can be coded with one branch? ...and what the overhead of managing the bit-tree can be minimized to?
Posted on 2003-03-27 13:33:31 by bitRAKE


mov ecx, -4
@@:
add ecx, 4
bsf eax, [edx + ecx]
jnz @B

btr [edx + ecx], eax
lea ecx, [ecx*8 + eax]


Probably not the smallest, or the fastest, but a good combination of the two I think.

Mirno
Posted on 2003-03-27 13:35:25 by Mirno

Sweet idea Thomas! :alright: Wonder if it can be coded with one branch? ...and what the overhead of managing the bit-tree can be minimized to?


I think the algorithm can be optimized pretty well, a couple of cmov instructions can reduce the number of branches but I don't know if one will be enough.
The overhead for managing the tree is not much but a bit more than looking up. However it only needs to be updated when a dword gets full, so on average only once in 32 allocations (that's not bad :) )

Thomas
Posted on 2003-03-27 13:51:05 by Thomas
The pseudo code for looking up is:


bool table[15];
// Searches the tree starting at node root,
// assuming a free entry is present!
int getFreeEntry(int root, int level)
{
if (level>2) return root - 7;
int child1 = root * 2 + 1;
int child2 = root * 2 + 2;
return getFreeEntry(table[child1] ? child1 : child2, level+1);
}

This is a recursive version but its end-recursive so it could be converted to an iterative version and then unrolled to speed things up. I'll think a bit about it and post my findings here.

Thomas

edit: there is another property of the tree that we can exploit since we assume the subtree currently working on *always* contains at least 1 free bit.


a
/ \
/ \
/ \
/ \
/ \
b c
/ \ / \
/ \ / \
d e f g
/ \ / \ / \ / \
h i j k l m n o

For example, working on a, you need to check both b and c to see in which part the empty bit is. However if you find that b is false, c *must* be true because the free bit has to be somewhere. Therefore we can skip the check for c and immediately proceed to f (and then g).

Thinking about this, the tree could be simplified. Assuming at least one bit is free, we can give each node a new meaning:
false: at least one free bit is present in the left subtree (possibly in the right tree too but we don't know that)
true: at least one free bit is present in the right subtree (but not in the left subtree).

This leads to a tree with one layer less, but one that is harder to manage.
Posted on 2003-03-27 14:10:04 by Thomas
	xor	eax, eax

mov ecx, depth
@@: bt [edx], eax
rcl eax, 1
inc eax
dec ecx
jne @B

;; ??
btr [edx], eax
Certainly becomes O(log n/32) taking two bits per structure!
Posted on 2003-03-27 14:16:40 by bitRAKE

	xor	eax, eax

mov ecx, depth
@@: bt [edx], eax
rcl eax, 1
inc eax
dec ecx
jne @B

;; ??
btr [edx], eax
Certainly becomes O(log n/32) taking two bits per structure!

Very nice trick :alright:. Could be unrolled pretty well too. I think the more simple version of the tree won't be any faster since it takes more to maintain it.. So this is probably better.

Thomas
Posted on 2003-03-27 14:27:38 by Thomas
Originally posted by Thomas
there is another property of the tree that we can exploit since we assume the subtree currently working on *always* contains at least 1 free bit.
[...]
For example, working on a, you need to check both b and c to see in which part the empty bit is. However if you find that b is false, c *must* be true because the free bit has to be somewhere. Therefore we can skip the check for c and immediately proceed to f (and then g).

Hmm, actually this also holds for the left subtree, it's just doing stuff that would have been done in the next iteration otherwise so this isn't an optimization (I was thinking wrongly).

What about the updating of the tree? Having a dword at index i that gets full, the full path to that bit needs to be updated, starting by setting the leaf node for index i to 0, then propagating the value to the top of the tree, setting each node to 1 if at least one of its childs is 1 (0 otherwise).
Posted on 2003-03-27 14:37:08 by Thomas

What about the updating of the tree? Having a dword at index i that gets full, the full path to that bit needs to be updated, starting by setting the leaf node for index i to 0, then propagating the value to the top of the tree, setting each node to 1 if at least one of its childs is 1 (0 otherwise).
I thought about this a little - best I could think of is O(n-m). (m) being the full block -- maybe less in practice?

Using a free list - similar to Dr. Manhattan's suggestion seems very quick, but uses a dword of memory for each structure -- not very practicle for small structures. Could combine the two methods: a free list pointing to dwords in the bit array. This means two bits per structure - allocating and freeing structs should be fast.

Building a test is in order... :)
Posted on 2003-03-27 21:32:26 by bitRAKE
Just in case of speed (bt and similar are very slow) here is my solution. It is rather straightforward but it do the job :)

	lea edi, BitArray

xor eax, eax
xor ecx, ecx
dec ecx ; or length of bitfiled
repe scasd
mov eax, [edi]
not ecx
shl ecx, 5 ; multiply by 5
mov edx, ecx ; save them
xor ecx, ecx
@@: rcr eax, 1
inc ecx
jnc @B
shl eax, cl ; restore with unset bit
mov [edi], eax ; save bitfield
add edx, ecx ; edx holds 1-based index
Posted on 2003-03-28 09:20:49 by masquer