Maveric in your code you need specify that
LOWLEVEL = least acceptable number in a range
UPPERLEVEL = greatest acceptable number in a range +1
Posted on 2002-03-26 05:53:47 by The Svin
but that was a "trade secret" and I didn't feel to mention it.

It's OK, I'm also under pressure not to reveal anything wich belongs to core of my job. And sometime feel uneasy.
Posted on 2002-03-26 05:59:20 by The Svin

Sorry, there was a typo. Correct code:



LEA ECX,[ECX-STARTVALUE]
ROL EAX,CL ;EAX = 1111 1111 1100 0000 0111 1110 ....
JS .label



Argh.. I'm still too much used to the 680x0.. I didn't recall that in x86 ROL doesn't update the sign flag. :(
SHL can be used in place, then. ;)

So, a very quick 32bit register "bitmap lookup table" is as easy as:



SHL EAX,CL
JS .label


Where EAX contains the mask (bit position reversed), and CL the index.

Note that the index cannot be 0 (because 0 doesn't produce any S flag).

Damn complexness (spelling?) and non-linearity of the Intel instruction set. :mad:
Posted on 2002-03-26 06:13:39 by Maverick

Maveric in your code you need specify that
LOWLEVEL = least acceptable number in a range
UPPERLEVEL = greatest acceptable number in a range +1
I did implictly in the Pseudocode (>=LOWER && <UPPER, and reversed condition for out of bounds).
Posted on 2002-03-26 06:14:47 by Maverick


quote:
--------------------------------------------------------------------------------
but that was a "trade secret" and I didn't feel to mention it.
--------------------------------------------------------------------------------


It's OK, I'm also under pressure not to reveal anything wich belongs to core of my job. And sometime feel uneasy.
Yes, but often I'm really maniacal about that.. :( I had a lot of ideas which became obsolete because I never had the time to exploit them.. nor others were allowed. I'm really ill about this topic, should really change someday.
Posted on 2002-03-26 06:25:41 by Maverick

hi!

I've just played a bit with some code, since I am a bit bored now. :grin:

... and found another solution, to solve the problem of validating hex-values, which is a bit shorter than The Svins :)



sub cl, 48
xor edx, edx
mov eax, 1
shld edx, eax, cl
shl eax, cl
and eax, 007E03FFh
and edx, 007E0000h
or eax, edx
jz @invalid


The char need to be in cl.

Cu, Jens
----


mov eax,1
shld edx,eax,whatever
edx = 0 ever
Posted on 2002-03-26 07:46:15 by The Svin

i would call it 'bitmap register lookup' :grin:
That is the better name.
Posted on 2002-03-26 07:53:30 by bitRAKE
Jens,
here is what your code will accept though it is not HEX symboles:
from f9h to f0h,
from e6h to e1h,
from d9h to d0h,
from c6h to c1h,
from b9h to b0h,
from a6h to a1h,
from 99h to 90h,

shall I continue?
You got good idea, but wrong implementation.
Posted on 2002-03-26 08:28:14 by The Svin
hi!

uppps ... sorry ... I missunderstood the description of the SHLD instruction ... i normally don't use this instruction, and i thought it allow me to use 2 32bit-registers like 1 64bit-register, just without saving the low-dword.
Still good that i posted this code, now i learned that shld is crap, and how to use it correctly. :grin:

Cu, Jens
Posted on 2002-03-26 08:53:34 by Jens Duttke
hi!

Ok, I've rewritten my code (and tested it, so I am 100% sure now, that it works :) )

The bad side : The "bitmap" need to be defined in the .data-section.
The good side : Thoretically it allow you, to make a map with a "endless" number of items ... not only 64.



.data

bitmap dq 007E0000007E03FFh
LOWER equ 48 ; 0
UPPER equ 102 ; f

.code

lea ecx, [edx - LOWER]
cmp ecx, UPPER - LOWER
ja @invalid
mov eax, ecx
and ecx, 1Fh
shr eax, 3
and al, 0FCh
bt dword ptr [bitmap + eax], ecx
jnc @invalid

The value need to be in edx, and it will NOT be modified. :)

Cu, Jens
----
http://www.emucheater.com
http://cyberpad.psxemu.com
Posted on 2002-03-26 09:27:19 by Jens Duttke
Jens, try this - it should work too. :)
lea	ecx, [edx - LOWER]

cmp ecx, UPPER - LOWER
ja @invalid
bt dword ptr [bitmap], ecx
jnc @invalid


For memory bit strings, this immediate field gives only the bit offset within a word or doubleword. Immediate bit offsets larger than 31 are supported by using the immediate bit offset field in combination with the displacement field of the memory operand. The low-order 3 to 5 bits of the immediate bit offset are stored in the immediate bit offset field, and the high-order 27 to 29 bits are shifted and combined with the byte displacement in the addressing mode. When accessing a bit in memory, the 80386 may access four bytes starting from the memory address given by:
Effective Address + (4 * (BitOffset DIV 32))
for a 32-bit operand size.
Posted on 2002-03-26 09:50:28 by bitRAKE
hi!

bitRAKE : To your macro, I checked it now, and it's nice done.

There is only one thing, which I "don't understand".

You say it has "No memory access", (compared to my last routine that's true, if you define memory access, as access to a .data-section) but for example if I want to check 100 values with your macro,
it would generate 4 times the

mov edx,msk
xor eax,eax
bt edx,ecx

these are 6 bytes opcodes + 4 bytes for the mask = 10 bytes for each dword-map.

So it will be for 100 items, 40 bytes in the memory.

My code use memory access, that's true ... but
1. it need only 16 bytes in the memory, not 40
2. it do only one dword access, so it need to read only 4 bytes, instead of 40

Now my question, your code need to read 40 bytes code from the memory, while my code read only 4 bytes data from the memory. So isn't it faster to do that with "memory" which is much smaller, instead of trying to do everything with code, which produce much more data at the end ?

I currently don't know much about branch misprediction, data caching in the processor and all this optimization stuff.
I just think, it can't be possible, that the processor can read/handle 40 bytes code, faster than 4 bytes data.
But if I am wrong, it would be nice if you could explain, why the processor can handle code faster than data.



Jens, try this - it should work too. :)
lea	ecx, [edx - LOWER]

cmp ecx, UPPER - LOWER
ja @invalid
bt dword ptr [bitmap], ecx
jnc @invalid
But not if the bitmap is larger than 32 bit :)
these shifts and and's are done to handle bitmaps which are larger than 32 bit.

Cu, Jens
----
http://www.emucheater.com
http://cyberpad.psxemu.com
Posted on 2002-03-26 10:00:08 by Jens Duttke
Newer processors will predict the code memory to load, but have a hard time predicting the data memory to load. Also, you figure of 40 bytes is wrong - there is not 4 dword masks. Add your code size to your data size to compare to mine based on size! The speed of execution will depend on if your data is in the cache or not. I will test non-cached versions of both when I get home.
Posted on 2002-03-26 10:10:09 by bitRAKE

But not if the bitmap is larger than 32 bit :)
Again you are wrong! I have used these same method on compression algos in 16-bit code - accessing 32K of memory just by changing the bit number! I will test the 32-bit code when I get home - this would be a huge inconsistancy of the x86 architechture if it didn't work!
Posted on 2002-03-26 10:13:32 by bitRAKE
hi!

That's the result, with 4 dword's

The Code produced by your macro : 54 bytes
My Code + Data : 44 bytes

With 6 dword's

The Code produced by your macro : 74 bytes
My Code + Data : 52 bytes

With 12 dword's :

The Code produced by your macro : 134 bytes
My Code + Data : 76 bytes

like you see, the difference will be larger and larger.

Cu, Jens
Posted on 2002-03-26 10:20:35 by Jens Duttke
hi!


Again you are wrong! I have used these same method on compression algos in 16-bit code - accessing 32K of memory just by changing the bit number! I will test the 32-bit code when I get home - this would be a huge inconsistancy of the x86 architechture if it didn't work!

Maybe I misunderstand you, or you misunderstand me, but


lea ecx, [edx - LOWER]
cmp ecx, UPPER - LOWER
ja @invalid
bt dword ptr [bitmap], ecx
jnc @invalid

will only check the first dword in (32 items), while my code check an endless number of bytes (a endless number of items).

Cu, Jens
Posted on 2002-03-26 10:24:24 by Jens Duttke
Jens, yes - as the number of masks increases your method uses less memory, but will it be in the cache? Will it polute the cache with data that isn't going to be used again? Your method is smaller.
Posted on 2002-03-26 10:26:42 by bitRAKE

Maybe I misunderstand you, or you misunderstand me
No misunderstanding. Read the quote above from the Intel Manual. Memory lookups are different than register lookups.

bt mem, count

Address calculation is:

<effective address> = mem + 4 * (count/32)

I have used this on 16-bit code where the address is:

<effective address> = mem + 2 * (count/16)

Then actual bit test is:

bt <effective address>, (count MOD 32) ; 32 bit
bt <effective address>, (count MOD 16) ; 16 bit

Please, read the Intel Manual. :tongue:
Posted on 2002-03-26 10:31:42 by bitRAKE
hi!


Jens, yes - as the number of masks increases your method uses less memory, but will it be in the cache? Will it polute the cache with data that isn't going to be used again?


Will your code be cached more than mine ?

That was my question, since i don't have a clue about this caching stuff, I asked you to answer this. ;)
I don't know which stuff will be cached and which not.

Cu, Jens
Posted on 2002-03-26 10:33:18 by Jens Duttke

Will your code be cached more than mine?
From my experience, I can only answer that question for my Athlon. Answer is yes, my code will be in the cache - but I do want to test to insure my sanity. :)
Posted on 2002-03-26 10:39:29 by bitRAKE