I've just alpha'ed a DLL I plan to use in a larger project. It's a very primitive hash table implementation... very straightforward. It's going to be used with bucket loads of smallish strings that will not be DWORD aligned upon input (think parsing an ascii file).

I suspect that this is currently bozo-ware & I would appreciate is some constructive criticism on the code as is so that I can correct any obvious gaffs. You don't have to be a coding guru to contribute, brand spankin newbies are welcome too. Just be specific & honest... but don't make me cry ;) Expect some dialog tho.

I've only done very light testing & I'm not asking you to do the bug hunt for me (that I can do); I'm not even asking you to try & run the DLL. Remember that this is an alpha & it's going to morph lots & soon. Here are some of the issues that I'm concerned about... to get the ball rolling:

* Are there any coding no-nos that jump out at you?
* Are there coding tricks that I missed?
* Is there any way I can avoid the cache thrashing that's likely to result?
* I learned on small instruction set computers so efficient use of the instruction set concerns me.
* Is the coding style easy to read?
* Is it commented enuf? too much?
* I didn't align the code... may be I should have.
You get the idea.

Let me know what you think... operators are standing by.
Posted on 2001-10-03 23:15:22 by rafe
rafe,

I just had a quick look through and while my head is currently in other stuff, I saw something that looks wrong,

.WHILE (edi <= edx)
mov cl,
mov bl,
mov cl,
cmp bl,
jne strNE
inc esi
inc edi
.ENDW

Inside the loop the first instance of writing the address in EDI to CL is overwritten with the address in ESI 2 instructions further down without having used the byte at the address in EDI.

I may not have digested the code properly but it looked unusual.

Regards,

hutch@pbq.com.au
Posted on 2001-10-04 03:00:35 by hutch--
I don't know what you mean? The byte at the EDI address is moved into CL which is then used as an index for CaseTab. In C++ it would be:

while (edi <= edx)
{
if(CaseTab[*edi] != CaseTab[*esi]) goto strNE;
esi++;
edi++;
}

I just think that C++ code is simplier to read, that's why I wrote it here.
Posted on 2001-10-04 05:01:21 by gliptic
Gliptic analysis is correct. This may not be an efficient means of doing a case insensitive compare ... a separate issue & where I'm wide open to suggestions.
------------------------------------------
One ringer I threw in there was the div instead of a mask on the hashing function... I fully expected to be beaten with a stick for doing that. Thanks for having look-see I didn't want anybody to make a career out of looking at the code, then again, that's what comments are for... do they help or hinder?
-----------------------------------------------
Notice that there are no Deletes & no Compresses. Using them would require "handles" to the structures which would translate to returning pointers to pointers to stucts back to the consumer rather than pointers to structs. For fixed length structs this is totally silly but these are variable lenght structs & it maybe it's a good idea here?

Thanks again
Posted on 2001-10-04 10:23:49 by rafe
rafe,

over time I have seen various methods of case insensitive comparison but simplicity seems to go for case conversion of both members then compare them. Case conversion is a fast process so it should not impinge on the speed of the algo that much.

If you can isolate the two strings, there are ways of doing the conversion on the fly as you do the comparison but it may not be any joy to code.

Regards,

hutch@pbq.com.au
Posted on 2001-10-04 20:04:53 by hutch--
Conversion is happening to both strings on the fly, so that's one vote for leaving that particular code segment the way it is. However, b/c I think we're talking cross purposes it appears that the commenting & documentation of the code is severly lacking. This is unhappy but necessary info & I'll make some changes to thecommenting in the beta.

Thanks Hutch.
Posted on 2001-10-05 10:39:29 by rafe
rafe,

Sorry if I have been a bit slow but I have been grinding away at sorting algorithms for a while now as time has allowed and I am a bit brain tired for it.

I can usually digest straight mnemonics but combined macro/mnemonics mean a bit more comprehension that I have had over the last week or so.

Regards,

hutch@pbq.com.au
Posted on 2001-10-06 03:00:25 by hutch--
Hutch, Thanks again! I know you're busy & I really appreciate the feedback. Sort of energizing to take that 1st step of putting code into the ether & to get some feedback.
-----------------------------
Here's where I'm at on this little DLL project. The beta's going to be more full featured. Altho I have my own selfish reasons for writing it (cartoon villain dreams of world domination:)), I think it should get a face lift in the beta so that it can be a realistic alternative in a broader range of apps... like spell checkers or perl-like hashes etc. Or at least the source could help others write their own.
------------------------------
Anyone else is welcome to comment too.

Thanks to All & especially Hutch
Posted on 2001-10-06 12:21:00 by rafe
I was just thinking that instead of having a 256 byte table at the start of the file for the case conversions, that sort of thing can be calculated easily enough.



mov bl,[edi]
.IF bl >="A" && bl <="Z"
add bl,"a" - "A"
.ENDIF


I think i had the "A"'s & "a"'s in the right places for a upper to lower case convertion. The only significat thing about this code is the .IF hides the one or two conditional jumps that will be perfromed. The plus is no mem acess, no 256 byte table and also frees up bl or cl
Posted on 2001-10-23 04:07:01 by huh
huh, i actually coded your way the 1st time around. Now i have to get off of my lazy fat ass & test the two alternatives. for me, the criterion will be speed over size (tho size is important too). I'll post the results & test code.
thanks
Posted on 2001-10-23 10:04:03 by rafe
Here you go, the best of both worlds. Size 28 Bytes, no memory table access, no (conditional) jumps, upper to lower case convertor.



cmp cl,"A" - 1
seta bl
shl bl,6
cmp cl,"Z" + 1
sbb bh,bh
xor bh,bl
inc bh
shr bh,2
and bh,20h
add cl,bh


In is CL with the char to be converted.
Modifys BL & BH only.
Can easily be changed to lower to upper case convertor by changing the "A" to an "a", the "Z" to a "z" and the add to a sub.
Posted on 2001-10-28 06:18:09 by huh
TOUPPER ASCII conversion.

These are some solutions I came with:


On P6 class CPUs:


LEA ECX,[EAX-32]
LEA EBX,[EAX-'a']
CMP EBX,'z'+1-'a'
CMOVC EAX,ECX



On P5 class CPUs:


LEA EAX,[EAX-'a']
CMP AL,'z'+1-'a'
SBB EBX,EBX
AND EBX,-32
LEA EAX,[EAX+EBX+'a']


bitRAKE (or others): do you have an idea why the above CMP would stall very bad on Athlon if instead of AL one uses EAX?
Posted on 2002-03-28 17:01:24 by Maverick
Maverick,

The stall is pretty obvious, its the partial read of AL after the LEA EAX.

Try SUB EAX, EAX or XOR EAX, EAX before the LEA. This is an Intel specific optimisation but it may help.

Regards,

hutch@movsd.com
Posted on 2002-03-28 17:25:48 by hutch--

Maverick,

The stall is pretty obvious, its the partial read of AL after the LEA EAX.

Try SUB EAX, EAX or XOR EAX, EAX before the LEA. This is an Intel specific optimisation but it may help.

Regards,

hutch@movsd.com
Hi Hutch. Perhaps you got it in reverse: CMP EAX shouldn't stall, instead stalls. CMP AL should stall (not on a Athlon AFAIK), instead doesn't stall. Or maybe I didn't get something?

If you re-read my posted code, that is the version that does NOT stall. Ain't it weird? I'm not much into x86 architecture but it really looks weird to me.
Posted on 2002-03-28 17:31:13 by Maverick

CMP EAX shouldn't stall, instead stalls. CMP AL should stall (not on a Athlon AFAIK), instead doesn't stall.
Yeah, the internals of the Athlon are very mysterious. I've been trying to state this for a long time. My test don't equate. The processor even optimizes some loops after some itterations. This has something to do with the internal RISC conversion on instructions. It is certainly non-linear in some cases. I still think the Athlon kicks Intel ass. :)
Posted on 2002-03-28 18:02:30 by bitRAKE
That BT ,reg thing is particularly weird. I'll come on it later, when I have some time.

Anyway.. the above CMP AL code that runs so fine on Athlons should anyway cause partial register stalls on PPro / Pentium II / PentiumIII, right? Which other CPUs?
Posted on 2002-03-28 18:11:34 by Maverick

Yeah, the internals of the Athlon are very mysterious. I've been trying to state this for a long time. My test don't equate. The processor even optimizes some loops after some itterations. This has something to do with the internal RISC conversion on instructions. It is certainly non-linear in some cases. I still think the Athlon kicks Intel ass. :)

bitRAKE: I discovered something very interesting.

As you already know, NASM by default doesn't generate short forms of instructions, while MASM (IIRC) does.

So:


CMP ECX,10


Will assemble differently on these two assemblers. MASM should have an advantage here, producing shortest code by default.

However, the Athlon sometimes stalls on these short forms.. or vice-versa it stalls on long forms!

The weird thing I reported last time, i.e. being (after a LEA EAX,..) CMP AL faster than CMP EAX, makes CMP EAX quick too if one uses the form CMP EAX,BYTE value .. which forces NASM to use the short form of the instruction.
Other times using the short form will stall the CPU.

Weird.. but good to know.
Posted on 2002-03-29 19:02:42 by Maverick
Maverick, what about the instruction alignment of the one that is stalling?
Posted on 2002-03-29 19:05:38 by bitRAKE
Interesting observation.

As I mentioned in a previous post, when we talk about instruction decoding, the Athlon is extremely sensitive to odd/even alignment (not so sensitive instead to 16 byte alignment).

It's true that changing alignment offset of 1 byte will cause those stalls, but it's also true that those stalls happen on the instruction regardless if it was starting on a even or odd address.

So it seems still a bit tricky.
Posted on 2002-03-29 19:23:30 by Maverick

So it seems still a bit tricky.
Yes, very - would make a good thread. I'll collect some of the examples I have here and create a thread regaurding Athlon decode/execution dynamics.
Posted on 2002-03-29 20:03:19 by bitRAKE