Hey everyone, just a quick question about rdtsc on the pentium 4. In all of my tests (1 desktop p4, 2 p4 mobiles), the return value from rdtsc is always a multiple of 4. What is the reason for this? My tests on my p3 and athalon work correctly giving single cycle resolution and I can't figure out what would cause this multiple of 4 stuff. The code I use is as follows

xor eax, eax
cpuid rdtsc
mov dword ptr time, eax
mov dword ptr time[4], edx
xor eax, eax

... code to time...

xor eax, eax
; overhead previously calculated
sub eax, overhead
; take care of a borrow in eax
sbb edx, 0

sbb eax, dword ptr time
sbb edx, dword ptr time[4]

Thanks for any help with this.
Posted on 2004-01-05 22:06:07 by AlexEiffel
Try insert nop in "code to time"?
Posted on 2004-01-05 22:09:40 by comrade

There was a note about P4 and RDTSC a while back on sandpile.org, where RDTSC returned 0 in the LSB of EAX. It was also noted in another discussion elsewhere, I compared some RDTSC code on a P3 and all was fine, another person compared the same code on a P4 and found that the value in bits 0-3 of EAX always returned a 0 or 4.

I'm not sure, but this might explain what you are seeing. Here's the text and replies from the more technical of the discussions, not sure if it solves the puzzle though:


I'm not sure if i want to ask anything or just state what i ran into
recently, but in any case, feel free to comment. so the thing is that RDTSC
seems to return a constant 0 in the LSB (in EAX). i assume it's due to the
microarchitecture of the P4 (something inside runs at double speed and
somehow RDTSC is executing on always the same clock modulo 2), but it is
nowhere documented or even mentioned in the Intel manuals. where it matters
is that there are pseudo random number generators that rely on sampling the
lower bits of the TSC for entropy (among others). obviously the above
behaviour means that there's one bit less than one would normally assume,
although it's probably not enough to break anything. i'd be interested in
learning how other CPUs behave, especially the newer architectures like
Opteron or those with Hyperthreading.

If you are dealing with a processor that can adjust its core clock
frequency, then you may face additional issues. (In other words: a
chip with SpeedStep, PowerNow!, LongHaul, or LongRun.)

For example, if the core clock is removed and then reapplied, then
there is the PLL relock time. Will your processor account for that
lost time in its TSC, or not?

Furthermore, the processor may increment the TSC value at the rate
of the current core clock frequency... or at the rate of a nominal
core clock frequency.

In other words: don't rely on the TSC for entropy or randomness.

I guess than anything beyond C1 and C2 power management (HLT and STPCLK)
poses a problem, too. No clock, no TSC...

Posted on 2004-01-05 23:58:47 by Kayaker