I've got an idea how to make code in my first post in the thread 3 bytes shorter:


lea ecx,[edx-30h]
cmp ecx,10
salc
lea ecx,[edx-'A']
cmp ecx ,('G'-'A')
adc al,al
lea ecx,[edx-'a']
cmp ecx,('g'-'a')
adc al,al
je @notvalid
Posted on 2002-03-28 02:37:14 by The Svin
A little bit different - size the same but less dependences:


lea ecx,[edx-30h]
mov ah,6
cmp ecx,10
lea ecx,[edx-'A']
db 0d6h
cmp cl,ah
lea ecx,[edx-'a']
adc al,al
cmp cl,ah
adc al,al
je @notvalid
Posted on 2002-03-28 04:06:12 by The Svin
Spent the last 30 minutes reading in detail the arguements :)

I have just one thing to say...

Maverick: Nice Work...
BitRake: Nice Work...
Svin: Nice Work...
Jens: You suck... but I like your StrLen example :alright:
:) :) :)

I think we should all take it a bit easier on each other...
Tempers rise from time to time, but remember it is always for the greater good... Remember that before you post something

Sliver (aka. The keeper of peace)

ps. Jens you did great work too... :)
Posted on 2002-03-28 04:15:14 by Sliver
I would hope that we can learn to communicate well, for there is much that we should learn from each other. There is nothing more important for me!
I'm really glad to hear that.. but frankly I'm quite sick to have to reply to pedantic notes some times, as if one couldn't add a bounds checking routine by himself to my public domain routine, instead of saying that I was cheating something. I didn't attach copyrights and such, I showed some code and, unlike others, I gave precise step by step explanations of how and why it works and what to do with it and how to modify it, etc.. I'm not here to make competitions, although I'd have no problems to do them as well. But for me competitions are a job thing.. I see this place as a friendly forum instead.

I was very surprised of the BT mem,reg execution as well. The manual states a latency of 8 cycles. I can execute nine instructions before the BT without them having any additional cost, but none after without it costing more cycles! BT mem,reg takes 5 cycles to complete. This is on a Athlon TB - I know you have an XP. Of the 5 cycles, the first three can pair with other instructions, but the latter two execute alone.
I think it was me (my lawyer suggests me to add this disclaimer: "I did not say me exclusively" ;) ) who said "do not delete any routine or method, you'll have a use for it someday", so..

The Svin: just a remark (if you wish to read):

Why wait 11 instructions to detect the error, when you can do it much sooner?
You could temporarily patch/hook the routine to build a statistic about, when the routine detects "out of range", what is the range that causes this, and then sort what are the most frequent ranges.
Then insert various branch instructions instead of ADC, so to jump out sooner. Check first for the most probable "out of range" conditions, and so on with the others.

Algorithm will perform equal in the worst case, and better depending on statistics.
Algorithm/routine/implementation will anyway perform even better, since instruction count will become shorter (adc -> jxx, so no final jxx).
Try to use EAX/AL when using CMP, because it results in shorter code (SALC that you used is another idea. For who doesn't know this instruction, it's undocumented but to my best knowledge it is supported by all modern and semi-modern Intel and non-Intel CPU's, and it's equivalent to SBB AL,AL, but shorter, and doesn't update flags (not that that would count in this application)).

BTW: SBB reg,reg can be expressed as a virtual instruction that I would call "MOVX reg,CF" (i.e. copy carry flag to all the bits of the specified register). We may also call it "MOV reg,CF". bitRAKE: what about a macro to implement this and other "virtual instructions"? It would help many beginners when they want to set all bits of a reg depending on the carry flag.. easier to remember MOV reg,CF or MOVX reg,CF than SBB reg,reg.


The Svin: to return to our discussion, try this possible implementation (applies the statistic method I mentioned above (i.e. reorders ranges to exploit statistics)):


;ISHEX? Maverick's non-bitmap-register-lookup method (which as I said wasn't the best way
;to implement a ISHEX? algorithm, here is a better method, given the rules of this test)
CMP AL,'g' ; AL = char
JNC .not_a_hex_digit
CMP AL,'0'
JC .not_a_hex_digit
LEA EAX,[EAX-'G']
CMP AL,'a'-'G'
JC .not_a_hex_digit
LEA EAX,[EAX-('9'+1-'G')]
CMP AL,'A'-'9'-1
JC .not_a_hex_digit


Faster, shorter, etc.. etc..

On my Athlon-XP it runs at (average for all 256 possible cases that cover the whole ASCII characters map) 1.64453125 cycles, vs 4.0000000 cycles of your last routines (both perform the same).

On other CPU's (Intel maybe) you may encounter serious stalls using my above routine, so change all CMP's "AL" istances to "EAX": speed is 1.73046875 cycles on my Athlon-XP.. and should be the same speed on those CPU's.

I have to stress: make your own executable format, dynamically loaded/linked, and make automatic load of the right routine depending on the host CPU. There's no reason to use the 1.73046875 cycles routine on a Athlon-XP.. why should we squeeze our brains on optimizations when we then waste cycles in such a stupid way? Standard tools are a prison.

As I said in my previous posts, I thought it was unfair to force me to use my bitmap register lookup method for HEX validation, since (although it was by far the fastest proposed one, in any case) I'd do it in a different way. My last routine should perform very well also on Pentiums (and is Pentium compatible), also, the same identical routine (changing only some operands) can validate also for alphanumeric, not just HEX.
Mine are just discussions of algorithms and techniques, I do not imply (disclaimer that I wish I didn't have to write.. but you never know now) that things in real applications couldn't be done better, or in a different way, I was the first one who stressed here the difference between theory and real world applications. Given the test rules, I wrote this code. I myself wouldn't even always use my ISHEX? routine, if you want to know. I hate to limit myself to a standard solution, in my programming language I even use normally a custom character set which is not ASCII, but is my custom one (I have ASCII support for import/export, though). I didn't like the rules you've set.
I use to seek for the most proper routine/solution _only_ after given a specific application (which is, probably, do things in a way that doesn't require any validation check at all).

I have nothing to prove, you could have added the bounds checking to my bitmap register lookup method by yourself instead of accusing me of unfair comparisons, mine were just advices. I proved my affirmations this time, I think you will agree I know how to code in assembly as well as how to find optimal algorithms, next time though I may not bother even if you say I talk and don't prove (as in the nazi thread, you recall?). I have limited time for such things, but I feel I have nothing to prove to anybody, also because when I say something it's never to look smarter than I am, sometimes I don't even care about the opposite. I don't bother, I believe in honesty (not for religion, I'm atheist.. but for self-esteem) and I am fine with myself, that's all I really care about.

I hope the polemic ends here.. I tryed to support you with some of my work, I hope you finally appreciate that.


PS: just for more completeness:

ISALPHANUMERIC? (uppercase and lowercase supported, as quick as my ISHEX? routine.. besides slight branch misprediction differences, which in short gives me on this standard ASCII-range test 1.80078125 cycles on average per call)

Again, change all AL to EAX for CPU's that suffer from eventual stalls. Things are always more tricky than CPU datasheets want to make you believe.. expecially on very modern CPU's. Don't believe to data sheets blindly.. PROFILE, PROFILE, PROFILE! That may make you discover a new thing every time, and anyway give you sureness that your code is ok as you think.


CMP AL,'z'+1 ; AL = char
JNC .not_alphanumeric
CMP AL,'0'
JC .not_alphanumeric
LEA EAX,[EAX-'Z'-1]
CMP AL,'a'-'Z'-1
JC .not_alphanumeric
LEA EAX,[EAX-'9'+'Z']
CMP AL,'A'-'9'-1
JC .not_alphanumeric


Mine was originally meant as a quick (and hopefully appreciated) contribution to the 64 bit bitmap register lookup method only, but all of this useless polemic made me use much more time on it. Worse than all, none of the routines showed until now can be considered ideal nor definitive if the specific application isn't taken into serious account. Like to say, it was all useless work.

I wish I could contribute more, but I'll be off for some time. Too much job engagements to finish. :mad:
Posted on 2002-03-28 07:48:19 by Maverick