I have an array of structures that are freed and allocated many times. I'm using a bit lookup table to quickly find the first free structure (faster than a free list and uses less memory than pointer table). Each array also has a free count - so I can quickly skip the array if there are no free structures. My question is one of optimization - I'd like to find the fastest and smallest way to use the bit lookup table to retreive the structure index in the array.

Thank you for the help!

p.s. I am aware that the bit index <> index into structure array due to the lower three bits being reversed. The important thing is just to have a one to one corespondents between the bit flag and the availiblity of a structure.

:) Smallest code might be:

```
; EDX is the start of the bit array
```

xor eax, eax

lea ecx, [edx - 4]

@@:

; NOTE: exit is forced, bit is always set

or eax, [edx]

lea edx, [edx + 4]

je @B

lea ebx, [edx - 4]

sub edx, ecx ; byte offset

bsf ecx, eax ; first bit set

shl edx, 3 ; bit offset

btr eax, ecx ; clear bit = structure is used

or edx, ecx

mov [ebx], eax

; EDX in the index number into the structure array

Before this code executes a test has already been made to insure one free structure exists (one bit in the array is set). Hope this explains what I'm after. I just thought it would be a good problem for people to grind their teeth on. :)
Thank you for the help!

p.s. I am aware that the bit index <> index into structure array due to the lower three bits being reversed. The important thing is just to have a one to one corespondents between the bit flag and the availiblity of a structure.

:) Smallest code might be:

```
xor eax, eax
```

@@: bt [edx], eax

inc eax

jnc @B

dec eax

btr [edx], eax ; structure array index

But what about the smallest code checking 32 bits at once?I don't mind to grind my teeth on this. Maybe I get it wrong, but is all you need is to find first bit setted? Then maybe rotate it

If I do get wrong, please, could you explain it in detail. How long will be the bit field, or it's length may vary

```
mov eax, [edx]
```

@@: rcr eax, 1

inc ecx

jnc @B

; ecx here contain 1-based position

If I do get wrong, please, could you explain it in detail. How long will be the bit field, or it's length may vary

The problem can be thought of in three pieces:

1. Find first set bit in array of bytes.

2. Clear set bit found in 1.

3. Get index value of bit

1. Find first set bit in array of bytes.

2. Clear set bit found in 1.

3. Get index value of bit

**masquer**, your code is a solution to part one only if the array is four bytes long. It is not that small. Let me compose a better example with some fake data:```
BitArray DWORD \
```

00000000000000000000000000000000y, \

00000000000000000000001000000000y, \

00001000010001000000010010100000y, \

00001000001000100000001000000000y

StrucArray MySTRUCT \

(8 * SIZEOF BitArray) dup <>

Each bit in the BitArray represents a structure in the StrucArray -- if the bit is set then the structure is not in use. The following code completes all three goals:```
lea edx, BitArray
```

xor eax, eax

@@: ; find set bit in array

bt [edx], eax

; INC doesn't effect carry flag

inc eax

jnc @B

dec eax

; clear bit because structure now in use

btr [edx], eax

; offset into structure array

imul eax, SIZEOF MySTRUCT

; can now use [eax + StrucArray] to access structure

Hope the example makes the problem clearer. The code should find a free structure, mark it as unavailible, and provide the address to the structure for use. There will also be a complementary algorithm to free a structure by setting the bit.Have you thought about this : every time you free a structure, you move the last allocated structure in place of the freed structure. This way there is no hole in your array and the index of the next free structure is equal to the number of allocated structures.

**Dr. Manhattan**, I can't move the structures around as there are pointers to them in other structures, but that is a good idea! Wouldn't the allocation have to equal the number of freed items?

**bitRAKE**, what about a divide and conquer approach using some kind of binary tree lookup for finding free nodes?

For example, say you got 8 dwords with each bit identifying the availability of a structure (8x32=256 bits). Finding a free spot would mean searching on average 8/2=4 dwords and then scanning one of them for the free bit index.

You could add a tree with a number of leaves equal to the number of dwords (depth log(n)+1):

```
```

a

/ \

/ \

/ \

/ \

/ \

b c

/ \ / \

/ \ / \

d e f g

/ \ / \ / \ / \

h i j k l m n o

[size=9]ph33r my ascii art sk1llz :)[/SIZE]

Each node in the tree holds a value, either true or false (could be stored in one bit). They tell whether a 0 bit in a specific part of the dword array is present:

```
```

a = 1 if a 0 bit is present in any of the 8 dwords (0..7)

b = 1 if a 0 bit is present in the first half of the dwords (0..3)

c = 1 if a 0 bit is present in the second half of the dwords (4..7)

d = 1 if 0 bit .. in (0..1)

e = 1 if 0 bit .. in (2..3)

f = 1 if 0 bit .. in (4..5)

g = 1 if 0 bit .. in (6..7)

h = 1 if 0 bit .. in (0)

i = 1 if 0 bit .. in (1)

j = 1 if 0 bit .. in (2)

k = 1 if 0 bit .. in (3)

l = 1 if 0 bit .. in (4)

m = 1 if 0 bit .. in (5)

n = 1 if 0 bit .. in (6)

o = 1 if 0 bit .. in (7)

The nodes could be stored in a heap-like manner:

```
```

0 1

index: 012345678901234

value: abcdefghijklmno

child1_index = parent_index * 2 + 1

child2_index = parent_index * 2 + 2

Now you can find the first dword with a 0-bit in 2*log(number_of_dwords) steps (both childs need to be checked, although on average only one). Naturally it's more efficient with more dwords.

Thomas

Sweet idea

**Thomas**! :alright: Wonder if it can be coded with one branch? ...and what the overhead of managing the bit-tree can be minimized to?```
```

mov ecx, -4

@@:

add ecx, 4

bsf eax, [edx + ecx]

jnz @B

btr [edx + ecx], eax

lea ecx, [ecx*8 + eax]

Probably not the smallest, or the fastest, but a good combination of the two I think.

Mirno

Sweet idea

**Thomas**! :alright: Wonder if it can be coded with one branch? ...and what the overhead of managing the bit-tree can be minimized to?

I think the algorithm can be optimized pretty well, a couple of cmov instructions can reduce the number of branches but I don't know if one will be enough.

The overhead for managing the tree is not much but a bit more than looking up. However it only needs to be updated when a dword gets full, so on average only once in 32 allocations (that's not bad :) )

Thomas

The pseudo code for looking up is:

This is a recursive version but its end-recursive so it could be converted to an iterative version and then unrolled to speed things up. I'll think a bit about it and post my findings here.

Thomas

For example, working on a, you need to check both b and c to see in which part the empty bit is. However if you find that b is false, c *must* be true because the free bit has to be somewhere. Therefore we can skip the check for c and immediately proceed to f (and then g).

Thinking about this, the tree could be simplified. Assuming at least one bit is free, we can give each node a new meaning:

false: at least one free bit is present in the left subtree (possibly in the right tree too but we don't know that)

true: at least one free bit is present in the right subtree (but not in the left subtree).

This leads to a tree with one layer less, but one that is harder to manage.

```
```

bool table[15];

// Searches the tree starting at node root,

// assuming a free entry is present!

int getFreeEntry(int root, int level)

{

if (level>2) return root - 7;

int child1 = root * 2 + 1;

int child2 = root * 2 + 2;

return getFreeEntry(table[child1] ? child1 : child2, level+1);

}

This is a recursive version but its end-recursive so it could be converted to an iterative version and then unrolled to speed things up. I'll think a bit about it and post my findings here.

Thomas

**edit:**there is another property of the tree that we can exploit since we assume the subtree currently working on *always* contains at least 1 free bit.```
```

a

/ \

/ \

/ \

/ \

/ \

b c

/ \ / \

/ \ / \

d e f g

/ \ / \ / \ / \

h i j k l m n o

For example, working on a, you need to check both b and c to see in which part the empty bit is. However if you find that b is false, c *must* be true because the free bit has to be somewhere. Therefore we can skip the check for c and immediately proceed to f (and then g).

Thinking about this, the tree could be simplified. Assuming at least one bit is free, we can give each node a new meaning:

false: at least one free bit is present in the left subtree (possibly in the right tree too but we don't know that)

true: at least one free bit is present in the right subtree (but not in the left subtree).

This leads to a tree with one layer less, but one that is harder to manage.

```
xor eax, eax
```

mov ecx, depth

@@: bt [edx], eax

rcl eax, 1

inc eax

dec ecx

jne @B

;; ??

btr [edx], eax

Certainly becomes O(log n/32) taking two bits per structure!```
xor eax, eax
```

mov ecx, depth

@@: bt [edx], eax

rcl eax, 1

inc eax

dec ecx

jne @B

;; ??

btr [edx], eax

Certainly becomes O(log n/32) taking two bits per structure! Very nice trick :alright:. Could be unrolled pretty well too. I think the more simple version of the tree won't be any faster since it takes more to maintain it.. So this is probably better.

Thomas

*Originally posted by Thomas*

there is another property of the tree that we can exploit since we assume the subtree currently working on *always* contains at least 1 free bit.

[...]

For example, working on a, you need to check both b and c to see in which part the empty bit is. However if you find that b is false, c *must* be true because the free bit has to be somewhere. Therefore we can skip the check for c and immediately proceed to f (and then g).

Hmm, actually this also holds for the left subtree, it's just doing stuff that would have been done in the next iteration otherwise so this isn't an optimization (I was thinking wrongly).

What about the updating of the tree? Having a dword at index i that gets full, the full path to that bit needs to be updated, starting by setting the leaf node for index i to 0, then propagating the value to the top of the tree, setting each node to 1 if at least one of its childs is 1 (0 otherwise).

What about the updating of the tree? Having a dword at index i that gets full, the full path to that bit needs to be updated, starting by setting the leaf node for index i to 0, then propagating the value to the top of the tree, setting each node to 1 if at least one of its childs is 1 (0 otherwise).

Using a free list - similar to

**Dr. Manhattan**'s suggestion seems very quick, but uses a dword of memory for each structure -- not very practicle for small structures. Could combine the two methods: a free list pointing to dwords in the bit array. This means two bits per structure - allocating and freeing structs should be fast.

Building a test is in order... :)

Just in case of speed (bt and similar are very slow) here is my solution. It is rather straightforward but it do the job :)

```
lea edi, BitArray
```

xor eax, eax

xor ecx, ecx

dec ecx ; or length of bitfiled

repe scasd

mov eax, [edi]

not ecx

shl ecx, 5 ; multiply by 5

mov edx, ecx ; save them

xor ecx, ecx

@@: rcr eax, 1

inc ecx

jnc @B

shl eax, cl ; restore with unset bit

mov [edi], eax ; save bitfield

add edx, ecx ; edx holds 1-based index