This seems to a really easy one, but I just wanted to make sure that I am not missing some important detail. For the following psuedocode,
$t0 = ($s1 - $s0/$s2) * $s4

we need an efficient assembly language code (again, meaning fewest clock cycles). Here is what I have:

div $s0,$s0,8
mul $s1,$s1,2
add $t1,$s1,$s2
sub $t0,$s0,$t1

I actually couldn't come up with anything else. what do you guys think?
Posted on 2012-01-16 14:01:16 by vasiqshair
well the div 8 could be a shr 3, mul 2 as shl 1
Posted on 2012-01-16 14:31:22 by evlncrn8
You seem to need to have the following code transformed into MIPS Assembly taking the fewest cycles possible:
$t0 = ($s1 - $s0/$s2) * $s4
The Assembly code you have performs the following pseudo-code:
t0= $s0/8 - ($s1*2 + $s2)
Posted on 2012-01-16 16:17:17 by LocoDelAssembly
I agree, the divide by 8 (a power of 2) can be replaced with a shift instruction, way faster than a div.
Posted on 2012-01-16 22:59:48 by Homer
mul 2 as shl 1

Shift instruction on some processors may not always be the most efficient. In the case of using shl 1 with one of the registers, the guaranteed fastest choice should be the addition with itself.
Posted on 2012-01-18 21:43:11 by Raymond
Traditionally, in terms of Digital Electronic Engineering in general, a single bit shift would be faster than an Add or Subtract, because its performed within a specific Shift Register, and does not care about Overflow.. therefore there's no wait. In fact it should take one clock cycle.
However it would not suprise me if the Add and Sub operators are faster on specific hardware, particularly given that binary operators are considered deprecated in the modern world, which is really silly imho.
Posted on 2012-01-20 02:59:26 by Homer
Well, in general, a shift is as fast or faster than an add. Generally they are both single-cycle instruction.
That is, a single-bit shift. Some CPUs can only shift one bit at a time, so shifting more than one bit may take more cycles. But that is generally on very old CPUs.
Modern CPUs usually have what is called a 'barrel shift' circuit, which can perform shift or rotate operations of any kind in a single cycle.

The Pentium 4 is a special case however. Where its predecessors had such a barrel shifter, the Pentium 4 did not, for some reason. It had an iterative implementation. As a result, a shift or rotate would take 2 to 4 cycles (yes, early out, whee).
To make matters worse, it had double-pumped ALUs, which could perform simple operations like add, sub, or, xor, and in 0.5 cycles.
So as a result, on a P4 you generally want to use add instead of shift if possible.
shl eax, 1 will take 2 cycles, where add eax, eax will only take half a cycle.
Posted on 2012-01-20 03:52:57 by Scali