Is there a trick to reverse the bits of a dword?
like 000001b -> 100000b
The highest number I need to reverse is 14-bit (16383)
Otherwise I'd have to make a 32kB lookup table and get the values with
movzx eax,ReversedBits
I need it for an FFT algo
Posted on 2004-08-30 08:21:54 by Ultrano
Check how NaN did it :)
Posted on 2004-08-30 08:27:23 by JimmyClif
Use "ROLL"

Posted on 2004-09-02 07:56:42 by mrgone
:) I don't think it'll reverse bits
Posted on 2004-09-02 08:39:03 by Ultrano
Rotate with Carry :)

Posted on 2004-09-02 09:16:08 by wizzra
...more like this

; ecx is the number to reverse
xor eax,eax
mov ebx,14; NumBits
shl eax,1
mov edx,ecx
and edx,1
or eax,edx
shr ecx,1
dec ebx
jnz @B
Posted on 2004-09-02 11:03:33 by Ultrano
I'd do it this way:

push 32
pop edx
shr ecx,1
rcl eax,1
dec edx
jnz @B

In an application that is to run at high speed, one might instead do it like this:

xchg ch,cl
rol ecx,16
xchg ch,cl
mov eax,ecx
and eax,0xf0f0f0f
xor ecx,eax
rol ecx,8
xor ecx,eax
mov eax,ecx
and eax,0x33333333
xor ecx,eax
ror eax,4
xor ecx,eax
mov eax,ecx
and eax,0x55555555
xor ecx,eax
rol ecx,2
xor ecx,eax
ror ecx,1
Posted on 2004-09-02 16:54:39 by Sephiroth3
i was wondering if you really need it. i know something about FFT, but i can't see why you would ever need it...
Posted on 2004-09-03 10:27:50 by lifewire
Thanks, Sephiroth3 - both algos are cool. Won't "mov edx,32" be faster than push/pop ? Btw just completed Final Fantasy 7 :)
Unfortunately it's not faster than the lookup-table :(

The algo I use for FFT is this:
Posted on 2004-09-03 12:16:04 by Ultrano
i think the mov is faster indeed, but the push imm8/pop reg is 3 bytes, while the mov reg, imm32 is 5 :)
Posted on 2004-09-03 12:42:12 by lifewire
Sephiroth3's second algorithm (the fast one) is based on a good idea, however, that particular code doesn't work. I've spent some time trying to prove that it does work, but upon failing, I decided to just test the code directly. The results are really strange. So I've fixed it up, and I've written it to be more broken down logically so that it is easier to show that it works.

First, you need this macro:


; This macro considers EAX to be a set of divisions of DivisionSize bits
; in size. For each such division, this macro swaps the low
; DivisionSize/2 bits of the division with the high DivisionSize/2 bits
; of the division. The result is (as a side-effect) rotated to the left
; by DivisionSize/2 bits.

BitMask = ( 1 SHL (DivisionSize / 2) ) - 1
BitMask = BitMask OR ( BitMask SHL DivisionSize )

mov ecx, eax
and ecx, BitMask
xor eax, ecx
rol ecx, DivisionSize
xor eax, ecx


Then, a simple way to do the full swap is like this:

mov eax, [Pattern]

SUBSWAP 32 ; Result is ROL'ed by 16
SUBSWAP 16 ; Result is ROL'ed by 16+8
SUBSWAP 8 ; Result is ROL'ed by 16+8+4
SUBSWAP 4 ; Result is ROL'ed by 16+8+4+2
SUBSWAP 2 ; Result is ROL'ed by 16+8+4+2+1

ror eax, 16+8+4+2+1

mov [Pattern], eax

However, to speed things up, you can do the SUBSWAP 32 and SUBSWAP 16 in a more conventional way:

mov eax, [Pattern]

ror ax, 8
ror eax, 16
ror ax, 8
SUBSWAP 8 ; Result is ROL'ed by 4
SUBSWAP 4 ; Result is ROL'ed by 4+2
SUBSWAP 2 ; Result is ROL'ed by 4+2+1

ror eax, 4+2+1

mov [Pattern], eax

Above code is in TASM Ideal mode. I've tested it and it works.
Posted on 2004-09-05 22:05:40 by yessopotamus
This might be faster:

mov eax, [Pattern]

ror ax, 8
ror al, 4
ror ah, 4
ror eax, 16
ror ax, 8
ror al, 4
ror ah, 4
SUBSWAP 4 ; Result is ROL'ed by 2
SUBSWAP 2 ; Result is ROL'ed by 2+1

ror eax, 2+1

mov [Pattern], eax
Posted on 2004-09-05 22:20:38 by yessopotamus