I want to create a hash table to use with a string search procedure.
Imagine it as a table with 3 columns.
The first is a 32 bit hash of the second column, and is the primary key of the table
The second is a string from 2 to 8 chars wide
The third is another string.

I don't want to know how to create this table but I want to know what is the best hashing algorithm
to use. No problem if it is not very fast but it should be not too difficult to code.
The table has about 5000 lines
Posted on 2003-11-07 13:45:57 by greenant
If you're building a lookup table, the hash is usually an index into the hash table, not a search key.

I personally like to use hash with chaining.
Hash_search: // returns p = pointer to matching table entry, NULL if not found


index := hash(string)
p := PointerTable[index]
while p != NULL
if p->key = string then break
p := p->next
end while
return p

Add_entry:

p := new Entry(string, ...)
index := hash(string)
p->next := PointerTable[index]
PointerTable[index] := p
Posted on 2003-11-08 18:50:10 by tenkey
Forget my first structure.

I have an array of 32bit numbers. These numbers are the hashes of my strings
Then I have an array of 32bit pointers. These are pointers to a table of strings.

I read a file, with a structure like ini file

name=description

Name is a string from 2 to 8 chars wide. The hashes are hashes of these strings
Description is a string that is placed in a table.

I read this file once, and then I create the structure in memory.
Then I have to retrieve a description associated to a string.

I need 2 functions: CreateTable and InsertItem.
I don't need DeleteItem or UpdateItem

CreateTable looks like this


.data
HashBase dd pointer to the hashes array
PtrBase dd pointer to the pointer array
StrBase dd pointer to the string table

HashNum dd number of elements in the hash array


HashPtr dd first free cell in the hashes array
StrBase dd " " " " the string table

.code

.while there are elements
invoke hash, addr StringToHash
mov [HashPtr], eax ;update hash array

mov [(HashPtr-HashBase) + PtrBase], StrBase ;update pointer array

invoke lstrcat, StrBase, addr DescriptionString ;update string table
invoke lstrlen, addr DescriptionString
add StrBase, eax
.endw


The search function looks like this


.data
HashBase dd pointer to the hashes array
PtrBase dd pointer to the pointer array
StrBase dd pointer to the string table

HashNum dd number of elements in the hash array

.code

mov ecx, HashNum
mov edi, HashBase

invoke hash, StringToFind

repne scasd
jnz StringNotFound

mov eax, [edi - 4 - HashBase + PtrBase] ;eax == pointer to the string

invoke MessageBox, hWnd, eax, eax, MB_OK ;Print the description


This is the pseudo code. Some instruction are impossible (mov mem, mem) and other are to be optimized (lstrcat, lstrlen)

I need an hashing routine
Posted on 2003-11-09 04:51:34 by greenant
It looks like you are looking for a function for a perfect hash. I don't believe there is a general formula for such a thing. Each such function will need to be tailored to the specific list of search strings.

Otherwise, you need to take into account that two different strings may hash to the same value.
Posted on 2003-11-09 17:54:31 by tenkey
I could use MD5 but is too big (128 bit). Maybe I can modify MD5 algorithm to get a 32 bit hash and not a 128 bit hash.
U will try
Posted on 2003-11-10 01:57:42 by greenant
You could try a couple simple formulas and see if they work for all your strings if they are static. Here is one to get you started:

xor edx, edx
mov ecx, strlen
mov esi, string
mov eax, ecx
@@:
xor dl,
mul edx
dec ecx
jne @B
; EAX is hash

Since the string is only eight bytes you could use the full string as the hash. :rolleyes: I used this method for an assembler and only had to handle a couple collisions - the last four bytes of the instruction string was the hash.
Posted on 2003-11-10 02:34:14 by bitRAKE
The string is at most 8 bytes. If I want to use it as an hash I have to pad all the string to 8 bytes. Bute the problem is that pentium doesn't have a fast singli instruction to compare an 8byte register with a 8byte memory operand.
I want to use a 32bit hash algorithm because I want a 32 bit hash
Posted on 2003-11-10 04:41:47 by greenant
There is good small hash function in FASM compiler.
Look in:
'parcer.inc' , at 'find_label:' label.

Maybe you have to make some editing.

BTW: if you need fast searching in the table, why not to think about sorting table by hash value (or even by strings without hash) and use fast binary search.

Regards.
Posted on 2003-11-10 11:50:08 by JohnFound