Continuetion our talks of checking ranges.
Complex statements to check numerous ranges with 1 jcc.
Example:
If valid hex digit.


.if (dl>="0" && dl<="9") || (dl>="A" && dl<="F") || (dl>="a" && dl<="f")

xor eax,eax
lea ecx,[edx-30h]
cmp ecx,9
adc eax,eax ;byte shorter than adc eax,0
lea ecx,[edx-'A']
cmp ecx ,('F'-'A')
adc eax,eax
lea ecx,[edx-'a']
cmp ecx,('f'-'a')
adc eax,eax
je @notvalid


Any practical idea is wellcome.
Posted on 2002-03-25 05:46:31 by The Svin
hi!

I tried your code, with 'F' as string and it say, it's invalid ...
So it should be 'G'-'A' instead of 'F'-'A'.
the same with cmp eax, 9 ... that should be cmp eax, 10

Also, since you test only DL, but use EDX in your calculation, you should remove the upper-bytes.

But the rest is nice done :)

So, your code should be :



and edx, 0FFh
xor eax,eax
lea ecx,[edx-30h]
cmp ecx,10
adc eax,eax ;byte shorter than adc eax,0
lea ecx,[edx-'A']
cmp ecx ,('G'-'A')
adc eax,eax
lea ecx,[edx-'a']
cmp ecx,('g'-'a')
adc eax,eax
je @notvalid


Cu, JNS
----
http://www.emucheater.com
http://cyberpad.psxemu.com
Posted on 2002-03-25 09:14:08 by Jens Duttke
You are right, thanks.
As too dl - edx, your right of course, but we may put in HLL explonative statemet edx instead of dl, I was absent minded - forgot to change dl to edx :)
Posted on 2002-03-25 09:59:09 by The Svin
hi!

I've just played a bit with some code, since I am a bit bored now. :grin:

... and found another solution, to solve the problem of validating hex-values, which is a bit shorter than The Svins :)



sub cl, 48
xor edx, edx
mov eax, 1
shld edx, eax, cl
shl eax, cl
and eax, 007E03FFh
and edx, 007E0000h
or eax, edx
jz @invalid


The char need to be in cl.

Cu, Jens
----
http://www.emucheater.com
http://cyberpad.psxemu.com
Posted on 2002-03-25 11:50:03 by Jens Duttke
Nice code,Jens.
I wish you being bored as long as possible to play with some code for us.
:)
There are cople thing I want to say.
1. It is shorter, all right.
3 bytes shorter. But you've changed character. Assume you get it as WM_CHAR
and need validate it without change. Then you need at least 2 more bytes.

2. Let think of it as just an example or how to check if value is in one of acceptable
ranges and we need to check it with little possible jcc.
I gave my code as just an example of one universal possible solution.
It checks 3 ranges but can check as many as you need.
Logic simple if edx at least in one of acceptable ranges in the checking
CF will be set and eax will be > 0 at the end.

Now I'm asking you about two things.
- Please comment your method
- Tell us if the method may be used with variable number of ranges,
and if it is, tute us how to use this approach.
If not, may be you have some different interesting general ideas how to
check numerous ranges with little jcc.
Posted on 2002-03-25 12:43:56 by The Svin

But you've changed character. Assume you get it as WM_CHAR
and need validate it without change. Then you need at least 2 more bytes.


What do you mean ? which character do i change ? you mean the sub cl, 48 ? ... you could simply change that to lea ecx, like you do it in your code.


- Please comment your method
- Tell us if the method may be used with variable number of ranges,
and if it is, tute us how to use this approach.
If not, may be you have some different interesting general ideas how to
check numerous ranges with little jcc.


hehe, ok

1. It's like a range-in-range test

You have a range of 64 bytes.

These 64 bytes are represented by the "bitMap" :

007E0000007E03FFh

binary it's this :

1111110000000000000000000000000011111100000001111111111

like you see ... we have 3 ranges ... the first range (the 6 1's) are the lowercase letters, then the uppercase letters and then the numbers.

now, let's assume the "input char" is '4' = 52d

1. we subtract 48 -> the result is 4
2. Now we set byte number 4 to 1 ... this is done with the shifts
3. and now, you AND our "bitMap" with the number, where bit number 4 is set
4. If the result is != 0 it matched and the number is valid ... if not it's invalid.

Sorry, my english is not that good, and I don't know how to explain such "complex" things correctly.

I hope you understand what i mean.

So, this method is useful, if you have only a small "overall-range"
but with many "matched" values and many "unmatched" values.

for example you could also check for

1010010101010110101010100100000111100101010101011011101110101110

that would be with normal jumps

cmp dl, 0
jz @invalid
cmp dl, 4
jz @invalid
cmp dl, 6
jz @invalid
cmp dl, 10
jz @invalid
cmp dl, 14
jz @invalid
... and so on ...

While my code has always the same length ... only the 2 lines
and eax, 007E03FFh
and edx, 007E0000h
need to be changed

Maybe someone, who is better in english and understand what i mean, can comment the code. :)

Cu, Jens
----
http://www.emucheater.com
http://cyberpad.psxemu.com
Posted on 2002-03-25 13:22:15 by Jens Duttke

What do you mean ? which character do i change ? you mean the sub cl, 48 ? ... you could simply change that to lea ecx, like you do it in your code.

In my code the character is in edx and I don't change it
In your code:
The char need to be in cl.

And you change it.
To avoid it you need one more register or more instructions.

Thanks for comments.
Posted on 2002-03-25 13:51:16 by The Svin
Very creative approach Jens Duttke! I like it, and will be using something similar in the future. Eliminates two-state small table lookups. :)
Posted on 2002-03-25 14:11:53 by bitRAKE
Jens,
BTW, Thomas, is keeping public snippet lib on his site

I'm sure many asm programmers would be happy to see your work there.

I want to say the same to Nexo, bullyNaza and all new things in asm creaters.
Posted on 2002-03-25 14:18:00 by The Svin
Jens, I've analyzed your method.
Man... You are very talanted!
I'm glad I asked you for comments.
Posted on 2002-03-25 17:02:55 by The Svin

small table lookups. :)
ahh ... that's the word which was missing ... look-up-table.
That explains it the best. :)

Cu, Jens
Posted on 2002-03-25 18:49:57 by Jens Duttke
Yeah, I'd call it a 'bit register lookup' :grin:
Sounds like a good name for the technique?
What would that be in German?
Posted on 2002-03-25 19:02:59 by bitRAKE
Extending the concept, here are other possible schemes:



ebx,ecx,edx,esi = up to 4 (5 if we use EBP, 6 if we use ESP) indexes we want to check
eax = temporary register
edi = scratch register (must be set to zero initially)


1) if ALL of the indexes are INSIDE of bounds, jump label

PseudoCode: IF (ebx>=LOWER1 && ebx<UPPER1) && (ecx>=LOWER2 && ecx<UPPER2) && (edx>=LOWER3 && edx<UPPER3) && (esi>=LOWER4 && esi<UPPER4) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-UPPER1]
cmp eax,LOWER1-UPPER1
adc edi,edi
;
lea eax,[ecx-UPPER2]
cmp eax,LOWER2-UPPER2
adc edi,edi
;
lea eax,[edx-UPPER3]
cmp eax,LOWER3-UPPER3
adc edi,edi
;
lea eax,[esi-UPPER4]
cmp eax,LOWER4-UPPER4
adc edi,edi
;
jz label


---

2) if ALL of the indexes are OUTSIDE of bounds, jump label

PseudoCode: IF (ebx<LOWER1 && ebx>=UPPER1) && (ecx<LOWER2 && ecx>=UPPER2) && (edx<LOWER3 && edx>=UPPER3) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
adc edi,edi
;
lea eax,[ecx-LOWER2]
cmp eax,UPPER2-LOWER2
adc edi,edi
;
lea eax,[edx-LOWER3]
cmp eax,UPPER3-LOWER3
adc edi,edi
;
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
adc edi,edi
;
jz label


---

3) if ANY of the indexes are INSIDE of bounds, jump label

PseudoCode: IF (ebx>=LOWER1 && ebx<UPPER1) || (ecx>=LOWER2 && ecx<UPPER2) || (edx>=LOWER3 && edx<UPPER3) || (esi>=LOWER4 && esi<UPPER4) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
adc edi,edi
;
lea eax,[ecx-LOWER2]
cmp eax,UPPER2-LOWER2
adc edi,edi
;
lea eax,[edx-LOWER3]
cmp eax,UPPER3-LOWER3
adc edi,edi
;
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
adc edi,edi
;
jnz label



---

4) if ANY of the indexes are OUTSIDE of bounds, jump label

PseudoCode: IF (ebx<LOWER1 && ebx>=UPPER1) || (ecx<LOWER2 && ecx>=UPPER2) || (edx<LOWER3 && edx>=UPPER3) || (esi<LOWER4 && esi>=UPPER4) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-UPPER1]
cmp eax,LOWER1-UPPER1
adc edi,edi
;
lea eax,[ecx-UPPER2]
cmp eax,LOWER2-UPPER2
adc edi,edi
;
lea eax,[edx-UPPER3]
cmp eax,LOWER3-UPPER3
adc edi,edi
;
lea eax,[esi-UPPER4]
cmp eax,LOWER4-UPPER4
adc edi,edi
;
jnz label


---

As you easily noted, if you invert ANY with ALL (or viceversa) and you also invert INSIDE with OUTSIDE (or viceversa) and the final branch instruction, then the code is perfectly the same.
That's why 3) and 4) are the same, and 1) and 4 are the same as well (of course with inverted branch).
This is exactly like to say that:
a OR b OR c
is perfectly equivalent to:
NOT ( (NOT a) AND (NOT b) AND (NOT c) )

---

Some remarks and general rules to help understanding, so this moves from "black magic" to something very intuitive, and expecially simple and easy to use:

a) the LEA,CMP trick simply sets the carry flag if we're inside the bounds, and clears it if we're outside the bounds.

b) said that, then we can use ADC reg,0 to accumulate in reg that single carry value, but we can also perform other operations, like RCL (more on this below). Note that instead of the 0 constant we can use a register, which makes the code smaller and thus faster (at least for cache considerations).

c) now think again about the a) point. We can not invert the behaviour of CMP (i.e. the result it gives), but we could exploit the CMC instruction where it matters to complement the carry flag. For example:

LEA EAX,
CMP EAX,UPPER1-LOWER1
CMC ; this inverts the carry flag, and thus the result
ADC EDI,0 ; where 0 will be in a register, preferably

But CMC may not be the fastest solution.
Having multiple indexes to check, we can use RCL:

LEA EAX,
CMP EAX,UPPER1-LOWER1
RCL EDI,1
many times, and then apply a XOR mask to our final EDI register result, so that after the XOR we get a zero result if the bit mask is the one we wanted, in order to satisfy all the conditions we want in order to perform the final branch.

But we don't even need that.. yes, we CAN invert the behaviour of CMP. How? Simply inverting the range:
LEA EAX,
CMP EAX,UPPER1-LOWER1
becomes:
LEA EAX,
CMP EAX,LOWER1-UPPER1

So, to sum it all:

---

we can produce a carry flag depending on if an index is in range or not (i.e. carry=1 if inside bounds):

LEA EAX,
CMP EAX,UPPER1-LOWER1

or, if the lower range is 0, then we can use just:

CMP EBX,UPPER1

we can get inverted CMP behaviour (i.e. carry=1 if outside bounds) by simply doing:

LEA EAX,
CMP EAX,LOWER1-UPPER1

or, if the lower range is 0, then we can use:

CMP EBX,UPPER1
CMC ;beware as CMC may get slow if abused, in some CPU's

---

Now that we've the power to produce a carry, inverted or not, depending on a lower..upper (or 0..upper) range, we can "accumulate" these carry flag results, one for each condition/index we want to check, using one of the following methods:

ADC reg,0 ; instead of 0 we can use a register which is known to be zero)
the above ADC will increment reg every time a carry flag was found set. A typical use is in the "OR" constructs.. i.e. if the final result of reg is not 0, then at least one ADC had a carry to add, and thus at least one condition was true (or false, depending if we produced the carry flag inverted or not).

we could set up reg e.g. to 4, and then SBB (subtract if the carry flag is found set) at each index/condition check: if our initial reg (which was set to the value 4) then reachs 0, we're sure that all 4 conditions were true. But we don't need this extra overhead: we can invert the conditions and use ADC.. thus use less instructions (remember the OR -> AND tranform of above?).

also, we could collect all carry flags via RCL reg,1 .. then XOR the final result with a mask, so that after the XOR we get a zero result if the bit mask is the one we wanted, in order to satisfy all the conditions we want to perform the final branch. But, again, we can instead always use ADC and simply invert the CMP result each specific time that we want so. This way we save the extra XOR.

Finally, we can accumulate a result by doing ADC reg,reg caring only that the reg was 0 *initially*. This because by being able to exploit the "inverted CMP" trick I exposed, we only care that the final result of all those ADC's will be 0 or not (to then either JZ label or JNZ label).

Note that ADC reg,reg doesn't take advantage of reg being eax (ADC reg,0 instead does). Neither LEA or XOR do take advantage of reg being eax. However, CMP does.. so I'm using EAX as temporary register in my lower..upper checks.

---

With the above explanation I tryed to move the discussion from black magic to something easy and intuitive to control.
Now there's something important to add.

After years of experience, I designed my programming language's compiler in a way that privileges the optimizer to the compiler.. i.e. the compiler doesn't try to be _excessively_ smart, while the optimizer does.

So a sequence like:

PseudoCode: IF (ebx>=LOWER1 && ebx<UPPER1) || (ecx>=LOWER2 && ecx<UPPER2) || (edx>=LOWER3 && edx<UPPER3) || (esi>=LOWER4 && esi<UPPER4) THEN GOTO label

gets initially compiled to:


lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
jc label
lea eax,[ebx-LOWER2]
cmp eax,UPPER2-LOWER2
jc label
lea eax,[ecx-LOWER3]
cmp eax,UPPER3-LOWER3
jc label
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
jc label


and then the optimizer (for how it is designed) could recognize this typical scheme (also aided by some side info), seek for another free register (it chooses, if present, one that incidentally is already known to be zeroed), and change the above code sequence in part or completely to:



;xor edi,edi (only if necessary)
lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
adc edi,edi
lea eax,[ecx-LOWER2]
cmp eax,UPPER2-LOWER2
adc edi,edi
lea eax,[edx-LOWER3]
cmp eax,UPPER3-LOWER3
adc edi,edi
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
adc edi,edi
jne label


But I didn't even bother to implement this.

Why? Because if we really want to exploit the power of assembly, then we should always use our brain, not just use others' code or a technique blindly. Note that in case all of our conditions are related by "OR", all of the above is absolutely stupid, since it will check all of 4 conditions when maybe even just the first was sufficient to exit this big "IF" construct.

In my programming language, I can set for each high level instruction (in this case "IF") an optimizer setting which then the editor normally hides. This optimizer setting, in this case, would be "optimize or not", and to which extent.

Extensive use of nearly totally AUTOMATIC profiling then fine-tunes these specific instruction optimizer's settings.

I try to keep things as much automatic as possible, although of course I'm here to advocate the use of hand written assembly. Yet being myself such a great fan of assembly has made me write a very efficient optimizer for my LLL/MLL/HLL language's compiler.

You should also be aware that if e.g. you have 4 conditions and "if any of them is valid you will jump", then you better put the most probable one as the first one to test (and so on), so that _statistically_ your routine will perform better. In my language I've a set of tools that help me also on these issues.. of course the goal is to produce extremely efficient code in less time than if I had to write it directly in hand optimized assembly. Honestly I'm very, very happy with the results.. and that's why I've always adviced you all to spend no less than half of your development time on your very own development tools.. that's the most intelligent and productive thing you could do.. at the end it pays with the interests, more than you may imagine. And it will feel "yours" more than anything else.


---


Fortunately this contribution wasn't NASM or MASM or anything specific ;)
Posted on 2002-03-25 19:18:40 by Maverick

hi!

I've just played a bit with some code, since I am a bit bored now. :grin:

... and found another solution, to solve the problem of validating hex-values, which is a bit shorter than The Svins :)



sub cl, 48
xor edx, edx
mov eax, 1
shld edx, eax, cl
shl eax, cl
and eax, 007E03FFh
and edx, 007E0000h
or eax, edx
jz @invalid


The char need to be in cl.

Cu, Jens
----
Hi Jens :)

For 32 bits, you can use this "register look up" technique instead:



LEA EAX,[ECX-STARTVALUE]
ROL EAX,CL ;EAX = 1111 1111 1100 0000 0111 1110 ....
JS .label

Just remember that the mask must be reversed in this case.
Posted on 2002-03-25 20:31:31 by Maverick
Sorry, there was a typo. Correct code:



LEA ECX,[ECX-STARTVALUE]
ROL EAX,CL ;EAX = 1111 1111 1100 0000 0111 1110 ....
JS .label
Posted on 2002-03-25 20:33:40 by Maverick
The Intel Manual states the the destination is undefined if the shift count is greaterthan the size of the destination of SHLD/SHRD! If this method is used then the shift count should be restricted to 0-31 for this instruction.

I offer this alternative:
bts eax,ecx    ; works for all ecx

ror ecx,6 ; 0-31 32-63 ;also: bt ecx,5
mov edx,eax ; 1 1
sbb ecx,ecx ; 0 1
xor eax,ecx ; 1 0
xor edx,eax ; 0 1
and eax,MASK1
and edx,MASK2
or eax,edx
je INVLD
But ecx needs to be in range 0-63. :(
Posted on 2002-03-25 21:45:50 by bitRAKE
I think this is much better... :)
	sub ecx,min    ; zero base

mov edx,MASK1 ; load flags
xor eax,eax ; initialize result store
bt edx,ecx ; test
mov edx,MASK2 ; load flags
rcl eax,1 ; store result
bt edx,ecx ; test
rcl eax,1 ; store result
; more masks...:)
shr ecx,5 ; which bit?
bt eax,ecx ; test proper bit of results
jc INVLD ;5 cycles + branch on Athlon
Limited to 1024 values tested - quite enough for characters. No memory access and no branch mis-predict is a big plus. And the macro:
; register look-up

reglu MACRO INVLD,min,masks:VARARG
LOCAL y,msk

IF min NE 0
sub ecx,min ; zero base
ENDIF
y=0
FOR msk, <&masks>
IF y EQ 0
mov edx,msk ; load flags
xor eax,eax ; initialize results
bt edx,ecx ; test
ELSE
mov edx,msk ; load flags
rcl eax,1 ; store result
bt edx,ecx ; test
ENDIF
y = y + 1
ENDM
rcl eax,1 ; store result
shr ecx,5 ; which bit?
bt eax,ecx ; test proper bit of results
jnc INVLD ;
ENDM
And the above goal is accomplished with:
	reglu @Invalid,30h,007E0000h,007E03FFh
Sorry, but I get carried away:
_HEXADECIMAL EQU <30h,007E0000h,007E03FFh>

_ALPHANUMERIC EQU <30h,000007FFh,0FFFE07FFh,0FFFE03FFh>

reglu @Invalid, _HEXADECIMAL
reglu @Invalid, _ALPHANUMERIC
Not bad, if I do say so myself. :)

Note to Athlon Programmers: BT is a vector path instruction!!
Posted on 2002-03-25 22:34:18 by bitRAKE
hi!


Yeah, I'd call it a 'bit register lookup' :grin:
Sounds like a good name for the technique?
What would that be in German?
German translations of stuff like this sucks.

But i would call it 'bitmap register lookup' :grin:

Cu, Jens
Posted on 2002-03-26 01:30:05 by Jens Duttke
Maverick,
You wrote very good demonstrative article on topic.
Also manipulation with carry and adc with the range teq. can lead us to calculate right label to jump at the end. It's especially powerfull when condition ranges is unproportional to cases.
So if cond 1,2,3 is met jmp to proc1 but if cond 2,5,8 is met jump to proc2 and if only 2,3 is met jump to proc3.
In real code the end looks just as:

jmp dpt

where reg is the reg that previously used in manipulation with
cf through adc and additional intruction to generate right case value. We often use the aproach in our database query coding.
Posted on 2002-03-26 05:23:34 by The Svin

Maverick,
You wrote very good demonstrative article on topic.
Also manipulation with carry and adc with the range teq. can lead us to calculate right label to jump at the end. It's especially powerfull when condition ranges is unproportional to cases.
So if cond 1,2,3 is met jmp to proc1 but if cond 2,5,8 is met jump to proc2 and if only 2,3 is met jump to proc3.
In real code the end looks just as:

jmp dpt

where reg is the reg that previously used in manipulation with
cf through adc and additional intruction to generate right case value. We often use the aproach in our database query coding.
Yup, I use the XOR mask technique in my compiler for complex constructs where AND and OR (even XOR) are mixed.. but that was a "trade secret" and I didn't feel to mention it. :grin:

I feel a bit ashamed about that.. but I'm a jealous guy. :tongue:
Posted on 2002-03-26 05:51:41 by Maverick