I needed a project to do...and since I like writing containers... I've started work on a safe-access container. That allocates and will eventually allow r/w and copying with bounds checking. The current code is meant to work with DWORD data.


rws: ;write
mov ecx,[SSobject] ;Address of object
mov eax,[SSval] ;Write value
mov ebx,[SSindex] ;Write index

and ebx, 0fffffffch ;index divisable by 4/free sign check!
js @F ;Less than start? Negative index.

mov edx, ebx

mov edx, [ecx+12] ;More than last? Address of last element
mov esi, edx ;*
sub esi, ebx ;*ebx=index
cmp esi, edx ;*
ja @F

mov ecx, [ecx+8] ;get current pointer to buffer and write
mov [ecx][ebx],eax

ret
@@:
mov eax, -1
ret

Because it is a safe-container and checks access pointers, there is a little more overhead. I would like to make it as little as needed. The three memory values at the start, are like a useage '?snapshot?'. If you keep using the same object, you will not need to update SSobject for every call to the methods stored in the object. I've got the 'is the index<the start' down to 2 instructions, and , js @F. If the index is less than 0, it is obviously out of range. CAn anyone do better with is the 'index greater than total range***' part?
Posted on 2002-07-19 11:04:07 by ThoughtCriminal
The general equation for quick bounds
(with being non-destructive to test value):
; EAX is the value to test:

lea edx,[eax-HighValue-1]
IF LowValue NE 0
sub eax,LowValue
ENDIF
xor edx,eax
jns @BadValue
This checks for inclusive.
LowValue is zero in your case, so the SUB is not needed.


Object STRUCT
d1 DWORD ? ; ?
d2 DWORD ? ; ?
Buff DWORD ? ; ptr
Size DWORD ? ; bytes in buffer
Object ENDS

; can only write to [0, (size-granularity)]
rws: ;write
mov ecx,[SSobject] ;Address of object
mov eax,[SSindex] ;Write index
mov edx,eax
sub eax, [ecx].Object.Size ;More than last? Address of last element
sub eax, 4-1 ; ensure whole dword lies in buffer (granularity of buffer -1)
xor eax,edx
jns @F ;Less than start? Negative index.
mov eax,[SSval] ;Write value
add edx, [ecx].Object.Buff ;get current pointer to buffer and write
mov [edx],eax
ret
@@: or eax, -1
ret
This is how I would do it.
Posted on 2002-07-19 12:03:34 by bitRAKE
:grin:



mov eax,[SSval] ; value
mov ebx,[SSindex] ; index
mov ecx,[SObject] ; object
bound ebx,[ecx+Object.Limits] ; check index bounds
add ebx,[ecx+Object.Buffer] ; add buffer pointer
mov [ebx],eax
Posted on 2002-07-19 13:18:46 by Nexo
Nexo, don't forget the exception handler. :grin:
Posted on 2002-07-19 13:23:06 by bitRAKE
Don't entirely understand this part:


sub eax, [ecx].Object.Size ;More than last? Address of last element

index=index-size


sub eax, 4-1 ; ensure whole dword lies in buffer (granularity of buffer -1)

???4-1??? You mean 3 right??


xor eax,edx

Uhhh.....
At least....


sub eax, [ecx].Object.Size

I learned a better way to express a register offset
Posted on 2002-07-19 13:23:44 by ThoughtCriminal
ThoughtCriminal, there are three possiblities: low, good, high.
; sign bits				; low	good	high

lea edx,[eax-HighValue-1] ; 1 1 0
sub eax,LowValue ; 1 0 0
xor edx,eax ; 0 1 0
jns @BadValue ;
If we were moving just bytes then the valid range would be [0,(size-1)]. But with dwords the valid range is [0,(size-4)]. Put those values in the algorithm above (LowValue = 0, HighValue = (size-4)) and you will see how the code came to be.
Posted on 2002-07-19 13:40:21 by bitRAKE

Nexo, don't forget the exception handler. :grin:

Yes, of course. As it will be not so fast to work. But sometimes it looks nice :)

Posted on 2002-07-19 15:38:57 by Nexo
Nexo, sorry, but I do not understand your question?
Or, maybe I have not noticed of what you speak?
Posted on 2002-07-19 15:42:52 by bitRAKE
I am speak about squared bracket.
You never saw Paradigm or Turbo assembler?
Posted on 2002-07-19 16:15:26 by Nexo
No, never seen Paradigm. I used TASM very briefly (couple months), read Tom Swan's book then some others, but I used square brankets with NASM and now FASM. They are easy to type as other letters. ;)
Posted on 2002-07-19 18:44:52 by bitRAKE

I needed a project to do...and since I like writing containers... I've started work on a safe-access container. That allocates and will eventually allow r/w and copying with bounds checking. The current code is meant to work with DWORD data.

[..]

Because it is a safe-container and checks access pointers, there is a little more overhead. I would like to make it as little as needed. The three memory values at the start, are like a useage '?snapshot?'. If you keep using the same object, you will not need to update SSobject for every call to the methods stored in the object. I've got the 'is the index<the start' down to 2 instructions, and , js @F. If the index is less than 0, it is obviously out of range. CAn anyone do better with is the 'index greater than total range***' part?


These should be optimal on a wide range of CPU's:



CMP REG,Upper ; REG is the register we want to test
JAE out_of_bounds ; or you can use JB inside_bounds (inverted condition)

and:


LEA REG2,[REG-Lower] ; REG is the register we want to test
CMP REG2,Upper-Lower ; REG2 is a temporary, scratch register
JAE out_of_bounds ; or you can use JB inside_bounds (inverted condition)


NOTE: in the above Lower..Upper ranges by "convention" (of a lot of programming languages) I consider Lower included and Upper excluded.

NOTE2: since we're talking about ~optimizations here, we should give to branch prediction the importance that it deserves.
Remember that uncached (i.e. "new") branches are considered by default "taken" if unconditional or if backward-targeted, and are considered "untaken" if indirect or forward-targeted.

This means that the above code is ok if statistically you believe that most of the times the code will go through with no branch happening, AND the eventual branch target label is forward.

If you believe that a branch is likely to happen often instead, then (for best optimization) you should place your target label backward, xor invert the branch condition (thus inverting again the branch statistics).




I already wrote with much detail about this subject, which I report here because I don't know how to put links of posts:




Extending the concept, here are other possible schemes:



ebx,ecx,edx,esi = up to 4 (5 if we use EBP, 6 if we use ESP) indexes we want to check
eax = temporary register
edi = scratch register (must be set to zero initially)


1) if ALL of the indexes are INSIDE of bounds, jump label

PseudoCode: IF (ebx>=LOWER1 && ebx<UPPER1) && (ecx>=LOWER2 && ecx<UPPER2) && (edx>=LOWER3 && edx<UPPER3) && (esi>=LOWER4 && esi<UPPER4) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-UPPER1]
cmp eax,LOWER1-UPPER1
adc edi,edi
;
lea eax,[ecx-UPPER2]
cmp eax,LOWER2-UPPER2
adc edi,edi
;
lea eax,[edx-UPPER3]
cmp eax,LOWER3-UPPER3
adc edi,edi
;
lea eax,[esi-UPPER4]
cmp eax,LOWER4-UPPER4
adc edi,edi
;
jz label


---

2) if ALL of the indexes are OUTSIDE of bounds, jump label

PseudoCode: IF (ebx<LOWER1 && ebx>=UPPER1) && (ecx<LOWER2 && ecx>=UPPER2) && (edx<LOWER3 && edx>=UPPER3) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
adc edi,edi
;
lea eax,[ecx-LOWER2]
cmp eax,UPPER2-LOWER2
adc edi,edi
;
lea eax,[edx-LOWER3]
cmp eax,UPPER3-LOWER3
adc edi,edi
;
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
adc edi,edi
;
jz label


---

3) if ANY of the indexes are INSIDE of bounds, jump label

PseudoCode: IF (ebx>=LOWER1 && ebx<UPPER1) || (ecx>=LOWER2 && ecx<UPPER2) || (edx>=LOWER3 && edx<UPPER3) || (esi>=LOWER4 && esi<UPPER4) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
adc edi,edi
;
lea eax,[ecx-LOWER2]
cmp eax,UPPER2-LOWER2
adc edi,edi
;
lea eax,[edx-LOWER3]
cmp eax,UPPER3-LOWER3
adc edi,edi
;
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
adc edi,edi
;
jnz label



---

4) if ANY of the indexes are OUTSIDE of bounds, jump label

PseudoCode: IF (ebx<LOWER1 && ebx>=UPPER1) || (ecx<LOWER2 && ecx>=UPPER2) || (edx<LOWER3 && edx>=UPPER3) || (esi<LOWER4 && esi>=UPPER4) THEN GOTO label



xor edi,edi
;
lea eax,[ebx-UPPER1]
cmp eax,LOWER1-UPPER1
adc edi,edi
;
lea eax,[ecx-UPPER2]
cmp eax,LOWER2-UPPER2
adc edi,edi
;
lea eax,[edx-UPPER3]
cmp eax,LOWER3-UPPER3
adc edi,edi
;
lea eax,[esi-UPPER4]
cmp eax,LOWER4-UPPER4
adc edi,edi
;
jnz label


---

As you easily noted, if you invert ANY with ALL (or viceversa) and you also invert INSIDE with OUTSIDE (or viceversa) and the final branch instruction, then the code is perfectly the same.
That's why 3) and 4) are the same, and 1) and 4 are the same as well (of course with inverted branch).
This is exactly like to say that:
a OR b OR c
is perfectly equivalent to:
NOT ( (NOT a) AND (NOT b) AND (NOT c) )

---

Some remarks and general rules to help understanding, so this moves from "black magic" to something very intuitive, and expecially simple and easy to use:

a) the LEA,CMP trick simply sets the carry flag if we're inside the bounds, and clears it if we're outside the bounds.

b) said that, then we can use ADC reg,0 to accumulate in reg that single carry value, but we can also perform other operations, like RCL (more on this below). Note that instead of the 0 constant we can use a register, which makes the code smaller and thus faster (at least for cache considerations).

c) now think again about the a) point. We can not invert the behaviour of CMP (i.e. the result it gives), but we could exploit the CMC instruction where it matters to complement the carry flag. For example:

LEA EAX,
CMP EAX,UPPER1-LOWER1
CMC ; this inverts the carry flag, and thus the result
ADC EDI,0 ; where 0 will be in a register, preferably

But CMC may not be the fastest solution.
Having multiple indexes to check, we can use RCL:

LEA EAX,
CMP EAX,UPPER1-LOWER1
RCL EDI,1
many times, and then apply a XOR mask to our final EDI register result, so that after the XOR we get a zero result if the bit mask is the one we wanted, in order to satisfy all the conditions we want in order to perform the final branch.

But we don't even need that.. yes, we CAN invert the behaviour of CMP. How? Simply inverting the range:
LEA EAX,
CMP EAX,UPPER1-LOWER1
becomes:
LEA EAX,
CMP EAX,LOWER1-UPPER1

So, to sum it all:

---

we can produce a carry flag depending on if an index is in range or not (i.e. carry=1 if inside bounds):

LEA EAX,
CMP EAX,UPPER1-LOWER1

or, if the lower range is 0, then we can use just:

CMP EBX,UPPER1

we can get inverted CMP behaviour (i.e. carry=1 if outside bounds) by simply doing:

LEA EAX,
CMP EAX,LOWER1-UPPER1

or, if the lower range is 0, then we can use:

CMP EBX,UPPER1
CMC ;beware as CMC may get slow if abused, in some CPU's

---

Now that we've the power to produce a carry, inverted or not, depending on a lower..upper (or 0..upper) range, we can "accumulate" these carry flag results, one for each condition/index we want to check, using one of the following methods:

ADC reg,0 ; instead of 0 we can use a register which is known to be zero)
the above ADC will increment reg every time a carry flag was found set. A typical use is in the "OR" constructs.. i.e. if the final result of reg is not 0, then at least one ADC had a carry to add, and thus at least one condition was true (or false, depending if we produced the carry flag inverted or not).

we could set up reg e.g. to 4, and then SBB (subtract if the carry flag is found set) at each index/condition check: if our initial reg (which was set to the value 4) then reachs 0, we're sure that all 4 conditions were true. But we don't need this extra overhead: we can invert the conditions and use ADC.. thus use less instructions (remember the OR -> AND tranform of above?).

also, we could collect all carry flags via RCL reg,1 .. then XOR the final result with a mask, so that after the XOR we get a zero result if the bit mask is the one we wanted, in order to satisfy all the conditions we want to perform the final branch. But, again, we can instead always use ADC and simply invert the CMP result each specific time that we want so. This way we save the extra XOR.

Finally, we can accumulate a result by doing ADC reg,reg caring only that the reg was 0 *initially*. This because by being able to exploit the "inverted CMP" trick I exposed, we only care that the final result of all those ADC's will be 0 or not (to then either JZ label or JNZ label).

Note that ADC reg,reg doesn't take advantage of reg being eax (ADC reg,0 instead does). Neither LEA or XOR do take advantage of reg being eax. However, CMP does.. so I'm using EAX as temporary register in my lower..upper checks.

---

With the above explanation I tryed to move the discussion from black magic to something easy and intuitive to control.
Now there's something important to add.

After years of experience, I designed my programming language's compiler in a way that privileges the optimizer to the compiler.. i.e. the compiler doesn't try to be _excessively_ smart, while the optimizer does.

So a sequence like:

PseudoCode: IF (ebx>=LOWER1 && ebx<UPPER1) || (ecx>=LOWER2 && ecx<UPPER2) || (edx>=LOWER3 && edx<UPPER3) || (esi>=LOWER4 && esi<UPPER4) THEN GOTO label

gets initially compiled to:


lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
jc label
lea eax,[ebx-LOWER2]
cmp eax,UPPER2-LOWER2
jc label
lea eax,[ecx-LOWER3]
cmp eax,UPPER3-LOWER3
jc label
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
jc label


and then the optimizer (for how it is designed) could recognize this typical scheme (also aided by some side info), seek for another free register (it chooses, if present, one that incidentally is already known to be zeroed), and change the above code sequence in part or completely to:



;xor edi,edi (only if necessary)
lea eax,[ebx-LOWER1]
cmp eax,UPPER1-LOWER1
adc edi,edi
lea eax,[ecx-LOWER2]
cmp eax,UPPER2-LOWER2
adc edi,edi
lea eax,[edx-LOWER3]
cmp eax,UPPER3-LOWER3
adc edi,edi
lea eax,[esi-LOWER4]
cmp eax,UPPER4-LOWER4
adc edi,edi
jne label


But I didn't even bother to implement this.

Why? Because if we really want to exploit the power of assembly, then we should always use our brain, not just use others' code or a technique blindly. Note that in case all of our conditions are related by "OR", all of the above is absolutely stupid, since it will check all of 4 conditions when maybe even just the first was sufficient to exit this big "IF" construct.

In my programming language, I can set for each high level instruction (in this case "IF") an optimizer setting which then the editor normally hides. This optimizer setting, in this case, would be "optimize or not", and to which extent.

Extensive use of nearly totally AUTOMATIC profiling then fine-tunes these specific instruction optimizer's settings.

I try to keep things as much automatic as possible, although of course I'm here to advocate the use of hand written assembly. Yet being myself such a great fan of assembly has made me write a very efficient optimizer for my LLL/MLL/HLL language's compiler.

You should also be aware that if e.g. you have 4 conditions and "if any of them is valid you will jump", then you better put the most probable one as the first one to test (and so on), so that _statistically_ your routine will perform better. In my language I've a set of tools that help me also on these issues.. of course the goal is to produce extremely efficient code in less time than if I had to write it directly in hand optimized assembly. Honestly I'm very, very happy with the results.. and that's why I've always adviced you all to spend no less than half of your development time on your very own development tools.. that's the most intelligent and productive thing you could do.. at the end it pays with the interests, more than you may imagine. And it will feel "yours" more than anything else.


---


I'd like to extend what I wrote some posts above about advanced uses of the
carry flag. When you check for a bound, e.g. from 0 (included) to 32 (excluded),
you can do, as we saw in that post, a simple:


CMP ECX,32

now the carry flag will be set if ECX is within 0 (included) to 32 (excluded).
(note that the "from included to excluded" is what is typically used in bound
checkings, for example for arrays).

We can now exploit another trick:


SBB EAX,EAX

regardless of what was in EAX before, now EAX will be all zeros if the carry
flag was set by CMP (which is like to say "if we were inside the bounds", but
EAX will be all ones (i.e. FFFFFFFF hex) if the carry was clear (i.e. if we
were out of bounds).

Now we can use the EAX mask we obtained to do many interesting tricks, using
for example the AND, OR and XOR instructions.

To remain in topic with this thread, we may want to select one of two 32 bit
values depending on if ECX is within the bounds or not.


On P6 and higher:


MOV EBX,MASK1
MOV EAX,MASK2
CMP ECX,32
CMOVC EAX,EBX

Now EAX will contain MASK1 or MASK2, depending if ECX is within bounds or not.


For CPU's <= P5 we can do instead:


MOV EAX,MASK2
CMP ECX,32
SBB EBX,EBX
AND EBX,MASK1^MASK2 ; MASK1 xor MASK2
XOR EAX,EBX

How does it work? It's simple.. and maybe it's about time to remove the magic
also from the tricks that use another very useful instruction: XOR.

So, we already know that, after SBB, EBX will be either all 0's or all 1's,
depending if ECX is within the bounds we've specified or not.
Now, for example, do you recall the formula to implement linear interpolation?

It's just a:

result = (A*(1-alpha))+(B*alpha) ; where alpha is from 0.0 to 1.0

but the above can be rewritten this way, which is perfectly equivalent (but simply more optimized):

result = A+(alpha*(B-A))

i.e. we start from A and then account only for the difference between B and A,
"to fill the gap".

We can use the same technique to set a value, and eventually fix it later as
much as needed to transform it into another value. We could have written it
also this way:


MOV EAX,MASK2
CMP ECX,32
SBB EBX,EBX
AND EBX,MASK1-MASK2 ; difference between MASKs
ADD EAX,EBX

It's just that I prefer to use logical instructions instead of arithmetic
ones, when possible (there are good reasons to do so, but I do not want to
make this post too long).

---

So now we've a P6 (using CMOV) or P5 method to select the right mask.. and
we can finally test it:


BT EAX,ECX
JNC .notvalid

But BT may be a slow instruction in some CPU's .. so here's an alternative
method:


SHR EAX,CL
JC .notvalid

Just remember two things if you use the above:
1) ECX must not be higher than 31 .. because otherwise unfortunately Intel does
not *guarantee* a correct carry flag :( So place a AND ECX,31 before the SHR.
2) You must increment of 1 unit CL to obtain the same result obtained with BT.
You may account for this in the LEA instruction, or rotate the masks, etc..

---

To sum it all this is my code for "bitmap register lookup". I've to admit
that my own programming language's compiler/optimizer gave me some hints,
because I was too lazy to think about some of the above things myself. ;)


MOV EBX,MASK1
MOV EAX,MASK2
CMP ECX,32 ; ECX is a 0..63 index
CMOVC EAX,EBX
BT EAX,ECX
JNC .notvalid

Maybe it can be improved furtherly.. but I've no time right now. I will try
later perhaps, since I'd like to send this stuff to Thomas' snippets section;
together with the profile code (BTW, grv575, since I'd like to give you proper
credits for your MASM translation, please could you give me your full name,
or anyway the name you prefer to be credited with?); and maybe also send the
stuff about CMP and the carry flag. Later I may add other contributions,
since I'm trying to be less ****ole and maybe share the little I know. Also,
I proven myself my ignorance on the x86 by the ROL wrong assumptions.. I've
really to clean my mind a bit about the 68000 and start doing some serious
x86 assembly instead.. like learning flags after all opcodes, etc.. lazyness
sucks. :)


ISHEX? (lowercase and uppercase supported)
;input: AL = character to test


LEA ECX,[EAX-'0']
MOV EBX,007E03FFh ; MASK1
MOV EAX,007E0000h ; MASK2
CMP CL,32
CMOVC EAX,EBX
BT EAX,ECX
JNC .not_a_hex_number


Finally, to know if a character is numeric, we can use the other technique
I exposed:

ISNUMERIC?
;input: CL = character to test


LEA EAX,[ECX-'0']
CMP AL,10
JNC .not_a_number


Really gotta go now.. have a nice day.

---

PS: the above ISHEX? routine was just an applicative example.. if you specifically look for a ISHEX? routine there are better solutions:


;ISHEX? (both uppercase and lowercase are supported)
CMP AL,'g' ; AL = char
JNC .not_a_hex_digit
CMP AL,'0'
JC .not_a_hex_digit
LEA EAX,[EAX-'G']
CMP AL,'a'-'G'
JC .not_a_hex_digit
LEA EAX,[EAX-('9'+1-'G')]
CMP AL,'A'-'9'-1
JC .not_a_hex_digit

and:


;ISALPHANUMERIC? (both uppercase and lowercase are supported)
CMP AL,'z'+1 ; AL = char
JNC .not_alphanumeric
CMP AL,'0'
JC .not_alphanumeric
LEA EAX,[EAX-'Z'-1]
CMP AL,'a'-'Z'-1
JC .not_alphanumeric
LEA EAX,[EAX-'9'+'Z']
CMP AL,'A'-'9'-1
JC .not_alphanumeric

others:


;TOUPPER ( >= P6)
LEA EBX,[EAX-'a']
LEA ECX,[EAX-32]
CMP EBX,BYTE 'z'+1-'a'
CMOVC EAX,ECX

;TOUPPER ( <= P5)
LEA EAX,[EAX-'a']
CMP EAX,BYTE 'z'+1-'a'
SBB EBX,EBX
AND EBX,BYTE -32
LEA EAX,[EAX+EBX+'a']
Posted on 2002-07-20 05:09:18 by Maverick
a) the LEA,CMP trick simply sets the carry flag if we're inside the bounds, and clears it if we're outside the bounds.


Shoot, I looked at using lea, but the opcode halp in MASM32 sadi lea didn't affect flags. Now I'll have to try this.

Maverick, thanks for the very in-depth post. It reeks of FAQ :grin:
Posted on 2002-07-20 08:36:33 by ThoughtCriminal
In fact LEA doesn't affect flags, but the CMP that follows it, does. ;)

You're wellcome about the other thing. :)
Posted on 2002-07-20 13:10:57 by Maverick