Hi,

I would like to know if there is something better than the bsf instruction to search the fisrt occurence of the bit 1 in a byte / Word / DWord... Because it is slow !!!!

Are there algos / instructions / tips wich is very faster ??

Please Help me
Posted on 2002-02-21 16:53:24 by DarkEmpire
I posted here one method invented by my freind Nick Poroshin not
so long ago - search it here (~6 cloks). Based on FPU.
For other things, it depends on what you need it for.
For example I did some flag analyze with invented method
to get last flag, not ordering number but value.
Posted on 2002-02-21 17:24:03 by The Svin
Can you give me an exemple / explain me about your flag analyze ?
Posted on 2002-02-21 18:02:29 by DarkEmpire
BSF / BSR are very fast on the P6 core, they are only slow on the older cores.

There is:


mov ecx, eax
neg ecx
and eax, ecx

This will mask out all but the least significant set bit.

Mirno
Posted on 2002-02-21 18:03:28 by Mirno
Yes, but this piece of code don't give me the position on the set bit ...

Or there is an other tip i have not found...

Can you help me ?
Posted on 2002-02-22 02:54:15 by DarkEmpire


[Replace Bit scan instructions] [Assembler][/][80386][+FPU]

BSF and BSR are the poorest optimized instructions on the Pentium, taking
approximately 11 + 2*n clock cycles, where n is the number of zeros
skipped. (on later processors it takes only 1 or 2) The following code
emulates BSR ECX,EAX:

test eax,eax
jz short bs1
mov [dword ptr temp],eax
mov [dword ptr temp+4],0
fild [qword ptr temp]
fstp [qword ptr temp]
wait ; WAIT only needed for compatibility with earlier processors
mov ecx,[dword ptr temp+4]
shr ecx,20
sub ecx,3ffh
test eax,eax ; clear zero flag
bs1:

The following code emulates BSF ECX,EAX:

test eax,eax
jz short bs2
xor ECX,ECX
mov [dword ptr temp+4],ecx
sub ecx,eax
and eax,ecx
mov [dword ptr temp],eax
fild [qword ptr temp]
fstp [qword ptr temp]
wait ; WAIT only needed for compatibility with earlier processors
mov ecx,[dword ptr temp+4]
shr ecx,20
sub ecx,3ffh
test eax,eax ; clear zero flag
bs2:

This gem comes from Agner Fog's "How to optimize for Pentium(tm)
microprocessor". This manual is highly recommended.
Gem writer: Agner Fog
last updated: 1998-03-16
Posted on 2002-02-22 09:26:19 by The Svin
"... (on later processors it takes only 1 or 2) ... "

Do you know after which processor bsf takes only 1 or 2 cycles ?

Can i consider that on PIII And Athlon bsf takes 1 or 2 cycle ?

Because, i don't know how i would test the number of cycle the bsf takes...

So, must i really optimize bsf with the piece of code you give me ?? or must i keep bsf ??
Posted on 2002-02-22 13:41:20 by DarkEmpire
Here is a piece of code with the rdtsc (an instruction i just discovered):

__asm
{
rdtsc
mov dwLow3, eax
mov dwHigh3, edx

rdtsc
mov dwLow4, eax
mov dwHigh4, edx

rdtsc
mov dwLow1, eax
mov dwHigh1, edx


bsf eax, nbToScan

mov BitSetPos,eax

rdtsc
mov dwLow2, eax
mov dwHigh2, edx
}

t3 = ( (__int64)dwHigh3 << 32 ) | (__int64)dwLow3;
t4 = ( (__int64)dwHigh4 << 32 ) | (__int64)dwLow4;
t1 = ( (__int64)dwHigh1 << 32 ) | (__int64)dwLow1;
t2 = ( (__int64)dwHigh2 << 32 ) | (__int64)dwLow2;
b = t2 - t1 - (t4 - t3);

b is the number of cycles to do 3 mov and bsf, if i understood what rdtsc...

The result was b = 2 cycles (with b = 4 and b = 2^31)

when i tried the piece of code you proposed, the result was more than more than 6000 cycles...

Have i done an error when i tried what you proposed me ?

In, fact i was surprised to see the result with the bsf method, because, when i read articles about bsf, all explained that bsf is slow (between 6 and 42 cycles) on PC 586 and more.
Posted on 2002-02-22 14:51:04 by DarkEmpire
From Agner... a bit down the page. It was a reading mistake I made too.

These emulation codes should not be used on the PPro, PII and PIII, where the bit scan instructions take only 1 or 2 clocks, and where the emulation codes shown above have two partial memory stalls.


So use them all you want unless your on an older machine.
Posted on 2002-02-22 15:11:03 by Mutant Slime
Hi,

bsf and bsr are intructions for intel processors... Are they useable with AMD processors too ? Or is there an equivalent instruction ?

Thanks for your help
Posted on 2002-02-23 13:02:09 by DarkEmpire
Check out www.amd.com they have manuals for the processors just like Intel does. A quick look at the athlon optimization manual shows that athlon has the instruction. Is it efficient on the Athlon? Well, I'll let YOU read the book ;)

Full link to manual stuff
http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_739,00.html
Posted on 2002-02-25 09:44:10 by Mutant Slime