Please help me with this question.
It is said that only one ASM instruction is needded to do that. I couldn't figure out the trick.
Posted on 2007-05-02 10:08:25 by bugdeveloper
xlat

For a byte, it does a lookup at ebx + al (zero extended to eax)
and returns the value from that location.

If a byte table has the count of one bits stored
for each byte value then xlat which uses al as the index
will return the count.
Posted on 2007-05-02 13:31:57 by dsouza123
popcnt is a SSE4 instruction that also will count the number of bits set (to 1).
Posted on 2007-05-02 13:39:53 by dsouza123
In general if you want to search for this, the search term is "population count" - but dsouza123already more than hinted this :). You should be able to find a lot of references to this. I have some vague memory of it being related to chess algorithm optimization?
Posted on 2007-05-02 18:07:45 by f0dder
I included a popcount in convertlib....

popcount FRAME dwValue
mov ecx,

mov eax,ecx
shr eax,1
and eax,055555555h
sub ecx,eax
mov eax,ecx
shr eax,2
and ecx,033333333h
and eax,033333333h
add ecx,eax
mov eax,ecx
shr eax,4
add eax,ecx
and eax,00F0F0F0Fh

mov ecx,eax
shr ecx,8
add eax,ecx
mov ecx,eax
shr ecx,16
add eax,ecx
and eax,00000003Fh

RET
ENDF


Pretty fast considering though I expect there are some much faster ones out there.

Donkey
Posted on 2007-05-03 01:13:37 by donkey
You could probably set up a couple of tables and then use

repnz scasw

to check for an ax match

A lot of work to set it up tho...you could set it up for bytes....al instead of ax..yikes

Slightly easier with a rotate loop  :D

mov ax,0F00   ;number to chek
mov bx,0        ;final bitcount store
mov cx,10
xxx
rcr ax,1
adc bx,0
loop xxx

Posted on 2007-05-03 06:15:03 by eek
eek: not good if you want it to be fast, though :)
Posted on 2007-05-03 07:00:35 by f0dder
how about this? (FASM syntax)


;EAX = number of bits in EBX
popcount:
and eax, 0 ;at least i am original :]
repeat 32
shr ebx, 1
adc eax, 0
end repeat
retn


can be further optimized by loading constants to registers

but donkey's version is probably faster...
Posted on 2007-05-03 10:49:56 by vid
Thanks for all you guys, I found two impl may be faster
1. use lookup table, each table is 256 choices, and the contents of each of them are the number of 1 bits in a byte.
2. use ( sorry, I'm not expert in ASM )

    while (var)  {
        var &= (var-1);
        cnt++;
    }

    cnt contains to number of 1 bits in var
Posted on 2007-05-07 10:24:14 by bugdeveloper
Hi,

I can't see that being too fast, I would assume that the cache would be affected and it requires 4 passes for a DWORD. A quick test shows mine at around 14 clocks (1417564 clocks for 100000 iterations) independent of the data and I am still assuming that there are faster ones out there.

Donkey

EDIT:

It doesn't help to make a statement without showing the test parameters, simple timing test...

mov edi,100000
mov ecx,edi
rdtsc
mov esi,eax
looper:

mov eax,ecx
shr eax,1
and eax,055555555h
sub ecx,eax
mov eax,ecx
shr eax,2
and ecx,033333333h
and eax,033333333h
add ecx,eax
mov eax,ecx
shr eax,4
add eax,ecx
and eax,00F0F0F0Fh

mov ecx,eax
shr ecx,8
add eax,ecx
mov ecx,eax
shr ecx,16
add eax,ecx
and eax,00000003Fh

mov ecx,edi
dec edi
jnz <looper

rdtsc
sub eax,esi
PrintDec(eax)
Posted on 2007-05-07 12:56:28 by donkey