Hi guys.....

It just came to my mind what exactly is MUL instruction for processor.

How it does the calculation after getting the operands ?

Posted on 2002-12-31 03:21:04 by processingspeed
mul src

It's the multiplication instruction, it multiplies eax with "src" and stores the 64-bit result in edx:eax, if "src" is 16 bit then ax is multiplied with "src" and stores the result in dx:ax (why the result isn't stored in eax I dont know, possible because eax didn't exist on the first CPUs). And IIRC if "src" i a 8 bit then al (or is it ah) is multiplied with "src" and the result is stored in ah:al (ax).
The edx:eax might be abit confusing since there is no register names edx:eax, but it mean that edx:eax is a 64-bit pseudoregister, where edx is the high 32-bits and eax the low 32-bits.

mov eax, 10
mov ebx, 10
mul ebx
; edx:eax is now 100

How the instrucion multiplies, I don't know, it might vary between different CPU models.
Posted on 2002-12-31 06:27:01 by scientica
Thanks anyways but it seems that you didn't get my question...... let me repeat the question..

How excatly micro processor performs multiplication of two numbers ?
Posted on 2002-12-31 06:41:46 by processingspeed
In terms of hardware, its a series of shifts and adds.
Usually there will be a library which contains multipliers of various sizes, these will be provided for a particular fab facilities process (such as TSMCs 0.15 LV process, or UMCs 0.13 standard process). As things such as multipliers tend to be used quite a lot, and are fairly chunky they need to be optimised. They also provide IO cell libraries (which is where the real money is made I believe).

In terms of actual hardware, they are a series of shifts and additions. This is why they tend to be slow, the logic path needed to do anything more than a 2 or 3 bit multiply tends to become the critical path, so has to be pipelined (and hence slow). Dividers are worse though!
The pipelineing in microprocessors killed performance, because the instruction couldn't complete until the final stage of the pipeline had executed. It is this and memory latency that pushed Intel and others towards the out of order execution engine that we see on the P6 core. While the pipeline is executing for several clocks, other unrelated calculations can continue on other parts of the core.

Take this example:

5 (operand A) x 10 (operand B)

A = 0101
B = 1010

|Op B
O 1|0001010
P 0|0010100
A 0|1010000

Use Op As bits to decide whether or not to
add the series of Op Bs shifted to the total

A x B = 0001010
+ 0000000
+ 0101000
+ 0000000

= 0001010
+ 0101000

= 0110010
= 50 decimal

Hope that helps

Posted on 2002-12-31 07:50:18 by Mirno