Is there a trick to reverse the bits of a dword?

like 000001b -> 100000b

The highest number I need to reverse is 14-bit (16383)

Otherwise I'd have to make a 32kB lookup table and get the values with

movzx eax,ReversedBits

I need it for an FFT algo

like 000001b -> 100000b

The highest number I need to reverse is 14-bit (16383)

Otherwise I'd have to make a 32kB lookup table and get the values with

movzx eax,ReversedBits

I need it for an FFT algo

Check how NaN did it :)

http://www.asmcommunity.net/board/viewtopic.php?t=13298&highlight=

http://www.asmcommunity.net/board/viewtopic.php?t=13298&highlight=

Use "ROLL"

ROL EAX,1

ROL EAX,1

:) I don't think it'll reverse bits

Rotate with Carry :)

RCL

RCL

...more like this

```
```

; ecx is the number to reverse

xor eax,eax

mov ebx,14; NumBits

@@:

shl eax,1

mov edx,ecx

and edx,1

or eax,edx

shr ecx,1

dec ebx

jnz @B

I'd do it this way:

In an application that is to run at high speed, one might instead do it like this:

```
```

push 32

pop edx

@@B:

shr ecx,1

rcl eax,1

dec edx

jnz @B

In an application that is to run at high speed, one might instead do it like this:

```
```

xchg ch,cl

rol ecx,16

xchg ch,cl

mov eax,ecx

and eax,0xf0f0f0f

xor ecx,eax

rol ecx,8

xor ecx,eax

mov eax,ecx

and eax,0x33333333

xor ecx,eax

ror eax,4

xor ecx,eax

mov eax,ecx

and eax,0x55555555

xor ecx,eax

rol ecx,2

xor ecx,eax

ror ecx,1

i was wondering if you really need it. i know something about FFT, but i can't see why you would ever need it...

Thanks, Sephiroth3 - both algos are cool. Won't "mov edx,32" be faster than push/pop ? Btw just completed Final Fantasy 7 :)

Unfortunately it's not faster than the lookup-table :(

The algo I use for FFT is this:

Posted on 2004-09-03 12:16:04 by Ultrano

Unfortunately it's not faster than the lookup-table :(

The algo I use for FFT is this:

Posted on 2004-09-03 12:16:04 by Ultrano

i think the mov is faster indeed, but the push imm8/pop reg is 3 bytes, while the mov reg, imm32 is 5 :)

Sephiroth3's second algorithm (the fast one) is based on a good idea, however, that particular code doesn't work. I've spent some time trying to prove that it does work, but upon failing, I decided to just test the code directly. The results are really strange. So I've fixed it up, and I've written it to be more broken down logically so that it is easier to show that it works.

First, you need this macro:

Then, a simple way to do the full swap is like this:

However, to speed things up, you can do the SUBSWAP 32 and SUBSWAP 16 in a more conventional way:

Above code is in TASM Ideal mode. I've tested it and it works.

First, you need this macro:

```
```

MACRO SUBSWAP DivisionSize

; This macro considers EAX to be a set of divisions of DivisionSize bits

; in size. For each such division, this macro swaps the low

; DivisionSize/2 bits of the division with the high DivisionSize/2 bits

; of the division. The result is (as a side-effect) rotated to the left

; by DivisionSize/2 bits.

BitMask = ( 1 SHL (DivisionSize / 2) ) - 1

REPT 15

BitMask = BitMask OR ( BitMask SHL DivisionSize )

ENDM

mov ecx, eax

and ecx, BitMask

xor eax, ecx

rol ecx, DivisionSize

xor eax, ecx

ENDM SUBSWAP

Then, a simple way to do the full swap is like this:

```
```

mov eax, [Pattern]

SUBSWAP 32 ; Result is ROL'ed by 16

SUBSWAP 16 ; Result is ROL'ed by 16+8

SUBSWAP 8 ; Result is ROL'ed by 16+8+4

SUBSWAP 4 ; Result is ROL'ed by 16+8+4+2

SUBSWAP 2 ; Result is ROL'ed by 16+8+4+2+1

ror eax, 16+8+4+2+1

mov [Pattern], eax

However, to speed things up, you can do the SUBSWAP 32 and SUBSWAP 16 in a more conventional way:

```
```

mov eax, [Pattern]

ror ax, 8

ror eax, 16

ror ax, 8

SUBSWAP 8 ; Result is ROL'ed by 4

SUBSWAP 4 ; Result is ROL'ed by 4+2

SUBSWAP 2 ; Result is ROL'ed by 4+2+1

ror eax, 4+2+1

mov [Pattern], eax

Above code is in TASM Ideal mode. I've tested it and it works.

This might be faster:

```
```

mov eax, [Pattern]

ror ax, 8

ror al, 4

ror ah, 4

ror eax, 16

ror ax, 8

ror al, 4

ror ah, 4

SUBSWAP 4 ; Result is ROL'ed by 2

SUBSWAP 2 ; Result is ROL'ed by 2+1

ror eax, 2+1

mov [Pattern], eax