hi,

i often see people who do maths with "SHR, SHL, AND" and some other stuff instead of really multiplying or doing division. could anybody please explain me the system behind that?

the only thing i know (hopefully i remember that well) is:

division by 2 is the same as SHR value,2, right?? :grin:

hehe.. thanks

i often see people who do maths with "SHR, SHL, AND" and some other stuff instead of really multiplying or doing division. could anybody please explain me the system behind that?

the only thing i know (hopefully i remember that well) is:

division by 2 is the same as SHR value,2, right?? :grin:

hehe.. thanks

To explain it in more simple terms I'll use decimal....

When we shift left, we are basically adding a 0 to the front of our number.

This is effectivly multiplying by 10.

When we shift right, we remove the first "character".

This is basically dividing by 10, and removing the remainder.

In binary the same is true, except we are not multiplying by 10 we are multiplying by 2.

Shifting left one place is like multiplying by 2^1.

Shifting left two places is like multiplying by 2^2

Shifting left three places is like multiplying by 2^3

And so on.

The reason its fast is because it is a simple binary operation (no complex pipelineing).

So in fact a fast divide by two is to shift right by 1 place.

Mirno

When we shift left, we are basically adding a 0 to the front of our number.

This is effectivly multiplying by 10.

When we shift right, we remove the first "character".

This is basically dividing by 10, and removing the remainder.

In binary the same is true, except we are not multiplying by 10 we are multiplying by 2.

Shifting left one place is like multiplying by 2^1.

Shifting left two places is like multiplying by 2^2

Shifting left three places is like multiplying by 2^3

And so on.

The reason its fast is because it is a simple binary operation (no complex pipelineing).

So in fact a fast divide by two is to shift right by 1 place.

Mirno

Yeah what Mirno said,

however I would shr or shl by 33 and not 1.

however I would shr or shl by 33 and not 1.

Eh why? What's the advantage?

Apparently this is a well known issue regarding SHR / SAR / SHL.

On 486's, there is an anomaly in the instructions.

(SHR / SAR / SHL) reg, 1 = latency 3

(SHR / SAR / SHL) reg, imm8 = latency 2

Based on the fact that all x86 processors since the 186 mask

shift counts modulo 32, this is an easy solution since imm8

can hold shift counts of 0 through 255. Basically we can code a

shift count of 33 to get an effective shift of 1 improving speed

ever so much. Every bit counts!

On 486's, there is an anomaly in the instructions.

(SHR / SAR / SHL) reg, 1 = latency 3

(SHR / SAR / SHL) reg, imm8 = latency 2

Based on the fact that all x86 processors since the 186 mask

shift counts modulo 32, this is an easy solution since imm8

can hold shift counts of 0 through 255. Basically we can code a

shift count of 33 to get an effective shift of 1 improving speed

ever so much. Every bit counts!

On the AMD Athlon, the only instruction this effects is RCL/RCR with the single shift versions being faster - latency of 1 (register) or 4 (memory) verses 5 or 6:

```
; latency
```

RCL reg, 1 ; 1 direct path

RCL mem, 1 ; 4 direct path

RCL reg, imm8/CL ; 5 direct path

RCL mem, imm8/CL ; 6 vector path

That's... weird. Duly noted, thanks guys. :)

Although I will never understand why shifting a x_bit register by x+ bits doesn't zero the whole register. :boggle:

Although I will never understand why shifting a x_bit register by x+ bits doesn't zero the whole register. :boggle:

The count is always masked: AND count, 1Fh (so says the Intel manual).

so is there a SAR/SHR/SHL version of

mov eax,2

mov ecx,12

mul ecx

???

can this be done faster? cause i need this to get values from a structure:

mov myvar.x,5

for example. but it is not possible that i use

mov myvar.x,5

it just works with 8 but not with 12... :confused:

bye

mov eax,2

mov ecx,12

mul ecx

???

can this be done faster? cause i need this to get values from a structure:

mov myvar.x,5

for example. but it is not possible that i use

mov myvar.x,5

it just works with 8 but not with 12... :confused:

bye

```
mov eax, 3 ;EAX == 3
```

shl eax, 3 ;EAX == EAX * 2 ^ 3

3 * 2 ^ 3 <==> 2 * 12 <==> 24 :)and how to use that in a loop? if eax is incremented from zero to .. erm.. 50 for example? and eax should always be multiplied by 12? is that possible?

For example, when we say to multiply a number by 12. First, find the largest number you can use with shl. Since the largest possible number is 3. What is 2 ^ 3? It's 8. What's the difference of 12 and 8? It's 4. This 4 is the number of times you are going to add the original value to the "shifted value".

Sorry for not explaining this a little bit clearer. I'll post a code later, if you still want it. Right now, I have to get some sleep, it's

P.S. Just remember multiply is just add, and division is just subtraction. ;)

Very Shagadelic. - Austin Powers :grin:

```
mov eax, 4
```

mov ecx, eax

shl eax, 3

add eax, ecx

add eax, ecx

add eax, ecx

add eax, ecx

Now assuming we want to multiply 4 by 12, all we have to do is find the largest value we can shift, and that is three and since the difference of 12 and 2 ^ 3 is 4, this is how many times we will add inside the loop ...
Sorry for not explaining this a little bit clearer. I'll post a code later, if you still want it. Right now, I have to get some sleep, it's

**2:45 A.M.**:)P.S. Just remember multiply is just add, and division is just subtraction. ;)

Very Shagadelic. - Austin Powers :grin:

hehe, it's about 12 P.M. here.. :grin:

yes, i would really appreceate some code! but: is the piece of code you just posted faster than multiplying?

yes, i would really appreceate some code! but: is the piece of code you just posted faster than multiplying?

What about the "and-math", have seen it to, please an explanation.

and:

P.S. Just remember multiply is just add, and division is just subtraction

?? or have I missed something ??

x+x+x = 3x <- ok

x-x-x = -2x != x/3 <- ...

x-x-x = -2x != 3/x <- ...

and:

P.S. Just remember multiply is just add, and division is just subtraction

?? or have I missed something ??

x+x+x = 3x <- ok

x-x-x = -2x != x/3 <- ...

x-x-x = -2x != 3/x <- ...

yeah, in my first post here i asked for "AND"-maths, too.. thought that it wasn't true, cuz there was no answer on it..

i agree with scientica (with the division= substraction stuff :grin: )

i agree with scientica (with the division= substraction stuff :grin: )

Hi,

First Off, AND is a logical instruction. It's not arithmetic, like MUL. That's why the guys on the previous page didn't explained AND because it has nothing to do with math, maybe just a little(maybe in some special cases, rare ones). Of course you can change bits using AND but in practicality, AND is rarely used during mathematical solutions.

SHL ... is a shift instruction. It's not also arithmetic but since there is a binary pattern on the powers of 2 when using this instruction, that's why It was used.

This is how an AND works.

TRUE AND FALSE == FALSE

TRUE AND TRUE == TRUE

FALSE AND FALSE == FALSE

If I have a binary sequence like this.

0111 (7 in decimal) and I will AND 0111 with

0010 (2 in decimal) I will get

____

0010 (2 in decimal)

You see there is no mathematical use in most cases.

As for the question, "which is faster SHL-ADD or MUL"? I can't really say which one is faster but in some cases SHL-ADD is faster, sometimes MUL is faster.

First Off, AND is a logical instruction. It's not arithmetic, like MUL. That's why the guys on the previous page didn't explained AND because it has nothing to do with math, maybe just a little(maybe in some special cases, rare ones). Of course you can change bits using AND but in practicality, AND is rarely used during mathematical solutions.

SHL ... is a shift instruction. It's not also arithmetic but since there is a binary pattern on the powers of 2 when using this instruction, that's why It was used.

This is how an AND works.

TRUE AND FALSE == FALSE

TRUE AND TRUE == TRUE

FALSE AND FALSE == FALSE

If I have a binary sequence like this.

0111 (7 in decimal) and I will AND 0111 with

0010 (2 in decimal) I will get

____

0010 (2 in decimal)

You see there is no mathematical use in most cases.

As for the question, "which is faster SHL-ADD or MUL"? I can't really say which one is faster but in some cases SHL-ADD is faster, sometimes MUL is faster.

```
[size=9].686
```

.MODEL FLAT, STDCALL

OPTION CASEMAP:NONE

INCLUDE \masm32\INCLUDE\windows.inc

INCLUDE \masm32\INCLUDE\kernel32.inc

INCLUDE \masm32\INCLUDE\user32.inc

INCLUDELIB \masm32\lib\user32.lib

INCLUDELIB \masm32\lib\kernel32.lib

INCLUDE \masm32\INCLUDE\masm32.inc

INCLUDELIB \masm32\lib\masm32.lib

.DATA?

g_ddTime1 DD ?

g_ddTime2 DD ?

g_dbBuffer DB 16 DUP(?)

.CODE

START:

;===========================================================================

;SHL-ADD (Original)

;===========================================================================

invoke QueryPerformanceCounter, OFFSET g_ddTime1

xor ebx, ebx

@@:

mov eax, ebx

mov ecx, ebx

shl eax, 3

add eax, ecx

add eax, ecx

add eax, ecx

add eax, ecx

inc ebx

cmp ebx, 50

nop

nop

jbe @B

invoke QueryPerformanceCounter, OFFSET g_ddTime2

mov eax, g_ddTime2

sub eax, g_ddTime1

invoke dwtoa, eax, OFFSET g_dbBuffer

invoke MessageBox, 0, OFFSET g_dbBuffer, 0, 0

;===========================================================================

;MUL

;===========================================================================

invoke QueryPerformanceCounter, OFFSET g_ddTime1

xor ebx, ebx

mov ecx, 12

@@:

mov eax, ebx

mul ecx

inc ebx

cmp ebx, 50

nop

nop

jbe @B

invoke QueryPerformanceCounter, OFFSET g_ddTime2

mov eax, g_ddTime2

sub eax, g_ddTime1

invoke dwtoa, eax, OFFSET g_dbBuffer

invoke MessageBox, 0, OFFSET g_dbBuffer, 0, 0

invoke ExitProcess,NULL

END START[/size]

As you can see, I used QueryPerformanceCounter, I'm not sure if the outputs are correct but I'll try maverick's profiler later. :)My approach is a little different than Stryker's.

To mul a number by twelve, first you've gotta think about the bit representation of 12. This is 1100b. I.E., 12=8+4. So why is this helpful?

Well, for all x

12*x = 8*x + 4*x

so if you wanna multiply eax by 12 you're gonna need to add 8*x and 4*x. We do this:

I dunno about division being subtraction though... and multiplication being addition is a bit of a stretch too.

BTW, there's also multiplication using the LEA instruction

LEA does some shifting

ex.

so to mul by twelve we use 12=3*4

i.e., first we mul the value by 4 by shifting by 2. Then we use lea to mul the resultant by 3

Simple, eh?

:)

And to answer your question about speed... shifting is almost always faster. So do it whenever you can.

As for "and" math... I'm not sure what you mean. There are some tricks using "and" and "test" and "xor".

Ex. cmp eax,0 can be replaced with test eax,eax

mov eax,0 can be replaced with xor eax,eax

etc

--Chorus

To mul a number by twelve, first you've gotta think about the bit representation of 12. This is 1100b. I.E., 12=8+4. So why is this helpful?

Well, for all x

12*x = 8*x + 4*x

so if you wanna multiply eax by 12 you're gonna need to add 8*x and 4*x. We do this:

```
```

mov edx,eax ;use edx as a temp value

shl eax,3 ;mul eax by 8

shl edx,2 ;mul edx (the original x) by 4

add eax,edx ;add 8*x and 4*x to get 12*x in eax

I dunno about division being subtraction though... and multiplication being addition is a bit of a stretch too.

BTW, there's also multiplication using the LEA instruction

LEA does some shifting

**and**addition at the same time. However, you can't do more than shl r,3ex.

```
```

LEA eax,[8*eax] == shl eax,3

LEA eax,[4*eax+eax] == shl eax,2 / add eax all at once i.e., mul eax,5

so to mul by twelve we use 12=3*4

```
```

shl eax,2

lea eax,[2*eax+eax]

i.e., first we mul the value by 4 by shifting by 2. Then we use lea to mul the resultant by 3

Simple, eh?

:)

And to answer your question about speed... shifting is almost always faster. So do it whenever you can.

As for "and" math... I'm not sure what you mean. There are some tricks using "and" and "test" and "xor".

Ex. cmp eax,0 can be replaced with test eax,eax

mov eax,0 can be replaced with xor eax,eax

etc

--Chorus

**scientica**,

As for DIV being SUB.

let's use a test case of 10/3. The answer to 10/3 is 3.3333.... (Non-Terminating)

Now look at the relationship between DIV and SUB

10 - 3 == 7

7 - 3 == 4

4 - 3 == 1

If we subtract 4 from 1 we will get a negative, so we stop. Now count how many times we subtracted from 10 by 3. That is 3 times, right? this is our quotient and the remaining value (1) will be our remainder.

Which is true if I used it this way.

```
mov eax, 10
```

mov ecx, 3

xor edx, edx

div ecx

You will get 3 in EAX and 1 in EDX.
But this isn't the right answer. The remainder is suppose to be a non-terminating 3. How do we fix this?

(Sorry for the wrong math terms Used - I seem to forgot most of the terms, I hope you'll get the idea)

Now let us look at the remainder first If you multiply that by 10 you'll get 10. Now do the same thing we did above and I guarantee you you'll get 3.3333... (Non-Terminating in the long run).

**P.S. Don't mind the NOPs used on my code above, that was just a teaser for NOP-erator.**:grin:

Very Shagadelic!!! ;)

OffTopic:

Stryker, did you see GoldMember last night?? It was fricken hilarious :D

--Chorus

Stryker, did you see GoldMember last night?? It was fricken hilarious :D

--Chorus

I planned to watch but I was very busy with other projects. I'll try to watch this weekend(Sunday). :grin:

Yeah! Baby Yeah! :grin:

Yeah! Baby Yeah! :grin: