hi all,

let say I have : bitbuffer=101100111010b

can some one show me how to count how many bits are on(1) and how many bits are off(0)??

I am stuck on this when it come to the part with registers, since the minimum register is one 4 bits,
I am looking for something like this:

bit=bit ptr
while not endof bitbuffer
if bit==0 than inc OnCounter
if bit==1 than inc OffCounter
inc pbuf

I hope this will help you easier to understand my question :(

thank you for the help in adv,

Posted on 2003-08-08 02:09:51 by Yanda
bt instruction

;eax = bitbuffer
xor edx, edx ; edx = OnCounter
mov ecx, 31
bt eax, ecx
adc edx, 0
dec ecx
jns @B
;OffCounter = 32 - edx

*note: untested, but I think you should get the idea.
made a mistake with the jcc
Posted on 2003-08-08 02:30:15 by roticv
Posted on 2003-08-08 02:32:05 by Yanda
I use Norbert Juffa's popcount, might be faster, never checked:
(Sorry about the first post it was formatted for GoAsm this is the masm version)
; Pass the value in ESI

; The set bit count is returned in EDX
mov edx,esi
shr edx,1
and edx,055555555h
sub esi,edx
mov edx,esi
shr edx,2
and esi,033333333h
and edx,033333333h
add esi,edx
mov edx,esi
shr edx,4
add edx,esi
and edx,00F0F0F0Fh

mov esi,edx
shr esi,8
add edx,esi
mov esi,edx
shr esi,16
add edx,esi
and edx,00000003Fh
Posted on 2003-08-08 02:57:51 by donkey
thank you all!

I know that you did a bt eax, ecx for testing bit,
but I just not understand the use of adc edx, 0 instruction.

could you explain to me please?

Posted on 2003-08-08 04:01:28 by Yanda
adc is just like add but if the carry flag is set, it is the same as add then inc.

For example, adc eax, edx

if CF set{
add eax, edx
inc eax
add eax, edx

Something like that if I am not wrong.

In my code, adc edx, 0

means that if carry flag is not set, then nothing is done to edx. However if carry flag is set, it would be the "same" as inc edx. (I just do not want to end up doing branching)
Posted on 2003-08-08 04:06:47 by roticv
thank you very much roticv!

now I got it!

thankx for the help!

Posted on 2003-08-08 04:24:10 by Yanda
Hi roticv.
I think the instruction js @B need be replaced to jns @B.
Posted on 2003-08-08 07:30:39 by MikDay
Hello, this is like the one roticv did ( not as optimized as roticv?s ) but no need for BT ( slower ).

EAX < REG32 to count bits from.
EDX > Count of bits that were 1.
ECX > 0

mov ecx,32
xor edx,edx
BitShiftLoop: shr eax,1
adc edx,0
dec ecx
jnz BitShiftLoop

Bye :D
Posted on 2003-08-08 07:50:49 by sheroc

Hi roticv.
I think the instruction js @B need be replaced to jns @B.

hehe.. I was confused about that part. I go change it. :/
Posted on 2003-08-08 08:04:15 by roticv
I still like Juffa's, I did some simple speed tests using Agner Fog's testing example and it came out at ~20 tics compared to around 100 for the other two. 5 times faster is quite a bit of savings for a function that is used often. I should note that I made the jns @B change to roticv's

roticv's algo = 98 tics
sheroc's = 105 tics
Juffa's POPCOUNT = 21 tics.

Average for 100,000 iterations
Posted on 2003-08-08 08:11:07 by donkey
Yeah, that bittrick is cool, although a bit hard to understand, but heh 5 times faster, worth a deeper look into it :D
Posted on 2003-08-08 09:15:58 by sheroc
Posted on 2003-08-08 09:18:57 by bitRAKE
thank you Donkey!

thank you all for the help!
Posted on 2003-08-08 12:38:55 by Yanda
Avoiding loops is of course better, but I wonder if avoiding shifts in the loop version would bring any improvement. We could exploit that N &= (N-1) removes the lowest set bit in N, and count the bits using something like (disclaimer: not tested, not optimised, early in the morning, etc. :grin: ):

    ; eax = bits

xor edx, edx ; counter = 0
test eax, eax ; if zero, return 0
jz done ;
mov ecx, eax ; temp = val
add edx, 1 ; counter++
sub ecx, 1 ; temp--;
and eax, ecx ; eax &= temp;
jnz @B ; continue if more bits left
; edx = count

The loop runs as many times as there are 1 bits in the value.
Posted on 2003-08-12 03:35:47 by Jibz
This is the sort of thing where the patterns you are dealing with can give you the algorithm to choose. If certain bits are more likely to be set than others, then you can select a better algorithm depending on which bits are likely to be set.
If the distribution is uniform across the whole 32 bit range, then Juffa's code will be the best choice, but if there are likely to be few bits set, using "sh(l/r)" and "bs(f/r)" could remove a large portion of the work. Alternatively a simple loop which doesn't count all bits, but stops when a register is zero could work well if the top, or low end is likely to have bits set.

mov eax, my_number
xor ecx, ecx

shr eax, 1
adc ecx, 0
shr eax, 1
jnz @B

adc eax, ecx
; result in eax

This will range from 5 to 75 clocks on a P3 (or it did for me at least), depending on the most significant set bit. Change the shr's to shl's and it will depend on the least significant set bit.

If you use a bsf/bsr to decide how many to shift, and the population of bits is sparce, then the saving will outweigh the cost of the bit-scan op.

I was also pleased by the above code because I only use 2 registers, so if it's in a macro (or inlined code) the register usage can be a factor in time consumption (although not enough to dent the 50 ticks for its worst case vs. Juffa's).

Posted on 2003-08-12 13:48:49 by Mirno