which is faster please?



movq ([esi],mm1);
pcmpeqd (mm1, mm0); // 00s if not equal


or


pcmpeqd ([esi], mm0); // 00s if not equal
Posted on 2003-05-15 07:39:31 by V Coder
Depends on the surrounding code and the processor. The first code allows for more control by the programmer (other instructions can be put between the two instructions to cover latencies/dependancies of the two instructions. While the latter code leaves the optimization to the processor and saves the use of a register -- newer processors do better with this code.
Posted on 2003-05-15 08:33:36 by bitRAKE
Does "newer processors" include Pentium III, or just to Athlon Barton/Thoroughbred and Pentium 4?
Posted on 2003-05-15 15:20:45 by V Coder
Does not include P3.
Posted on 2003-05-15 17:42:21 by bitRAKE
Thanks. Too bad though, I'm testing on a P3. (And I used the latter form of code in a few cases...Hmm that's one reason for code slowdown?)

Can't you just make a pronouncement that P3 is a newer processor so it could speed up? :rolleyes:
Posted on 2003-05-15 18:22:13 by V Coder
A rose by any other name is still a ...

I'm not saying the later form is always slower on older processors, but it most cases it is. :)
Posted on 2003-05-15 19:13:16 by bitRAKE
It will be a matter of latency & throughput for different processors, then?



movq mm3, [esi]
movq mm2, [esi+8]
pcmpeqd mm0, mm2
pcmpeqd mm1, mm3


should be faster than



pcmpeqd mm1, [esi]
pcmpeqd mm0, [esi+8]


on newer processors, older processors or both?
Posted on 2003-05-24 18:17:21 by V Coder
This is too little code to say exactly, but assuming the data is in the cache newer processors will do better on the latter. It would be best not to use the results right after the instructions. Let me explain each so you can see how both are good in different situations. :)

          movq mm3, [esi]

movq mm2, [esi+8]

; do something here to hide the load latency (couple cycles)

pcmpeqd mm0, mm2
pcmpeqd mm1, mm3

; results [b]can[/b] be used in the next instructions!


          pcmpeqd mm1, [esi]

pcmpeqd mm0, [esi+8]

; can [b]not[/b] use the results in mm0/1 for a couple cycles
The surounding code must be taken into account before choosing a method.
Posted on 2003-05-24 19:55:47 by bitRAKE
Thanks.

How much is a couple? Depends? No worry. I'll check further.

Thanks.
Posted on 2003-05-24 23:54:42 by V Coder