cpuid
rdtsc
mov timer,eax
nop
nop
nop
cpuid
rdtsc
sub eax,timer
ret
timer:
dword 0
Take time on 3 nop:s
Different eax each time why, what have i missed?
rdtsc
mov timer,eax
nop
nop
nop
cpuid
rdtsc
sub eax,timer
ret
timer:
dword 0
Take time on 3 nop:s
Different eax each time why, what have i missed?
Cache.
"mov timer,eax" could take 0.1 cycles, but also could take 500.
That's why there's a term "warming-up the caches" when benchmarking/profiling code.
Also, "cpuid" before "rtdsc" is not necessary.
Also, there's a quirk of paging, that memory isn't actually physically allocated until you access it.
So, either measure the timing of a loop (looping 100,000 - 100,000,000 times, you choose), or first make sure you've pre-accessed all of the necessary memory.
"mov timer,eax" could take 0.1 cycles, but also could take 500.
That's why there's a term "warming-up the caches" when benchmarking/profiling code.
Also, "cpuid" before "rtdsc" is not necessary.
Also, there's a quirk of paging, that memory isn't actually physically allocated until you access it.
So, either measure the timing of a loop (looping 100,000 - 100,000,000 times, you choose), or first make sure you've pre-accessed all of the necessary memory.
Also, "cpuid" before "rtdsc" is not necessary.
cpuid is a serializing instruction, it is necessary to prevent out of order execution of rdtsc on P6 series CPU's. Also so you stand less chance of a context switch during the test you should be setting the thread priority...
invoke SetPriorityClass,, REALTIME_PRIORITY_CLASS
invoke SetThreadPriority,, THREAD_PRIORITY_TIME_CRITICAL
Also, things like intel speedstep or AMD cool-n-quiet could be lowering your CPU frequency, you need to keep that in mind as well, and do a little CPU-intensive "warm-up" before profiling.
Also, set thread affinity to work around RDTSC bugs in AMD CPUs.
And only use rdtsc for profiling, never for timing in production code.
Also, set thread affinity to work around RDTSC bugs in AMD CPUs.
And only use rdtsc for profiling, never for timing in production code.
And only use rdtsc for profiling, never for timing in production code.
why? exactly because of current variable megahurtz?
i think there are some code "out there" that use gettimestampcounter for timing...
would it be broken since the time things are this way?
creepy :shock:
so whats to use? win32 timers?
Yes, variable MHz, and unsynchronized TSC values (in dualcore). With the first problem you could incorrectly measure some proc as being slower than another (until the MHz kick-in), and make your app use the actual slower version. With the second problem, you can get negative difference between time0 and time1 in RDTSC.
There was a discussion of this on VirtualDub's forums. Using the mm timer seems to be best practice (when timing audio and video streams) , though it takes 1000 cycles, as the 32768 Hz realtime-clock is queried. I don't recall if there the problems were present on some laptops, thanks to awful hardware/bios. Supposedly, MS fixed it all with relevant OS updates (except for those laptops) - search for msdn articles about it, too (I read it in the Knowledge Base section, iirc)
Btw, GetTickCount() simply returns a pre-cached value, that is set by the thread-scheduler when switching in response to the timer-interrupt (16.6ms granularity on my system, for instance).
There was a discussion of this on VirtualDub's forums. Using the mm timer seems to be best practice (when timing audio and video streams) , though it takes 1000 cycles, as the 32768 Hz realtime-clock is queried. I don't recall if there the problems were present on some laptops, thanks to awful hardware/bios. Supposedly, MS fixed it all with relevant OS updates (except for those laptops) - search for msdn articles about it, too (I read it in the Knowledge Base section, iirc)
Btw, GetTickCount() simply returns a pre-cached value, that is set by the thread-scheduler when switching in response to the timer-interrupt (16.6ms granularity on my system, for instance).
Yes, variable MHz, and unsynchronized TSC values (in dualcore). With the first problem you could incorrectly measure some proc as being slower than another (until the MHz kick-in), and make your app use the actual slower version. With the second problem, you can get negative difference between time0 and time1 in RDTSC.
...and that is why all Unreal engine games crash on AMD64x2, bitching about negative time delta :)
Also, on the dualcore AMD machines, QueryPerformanceCounter seems to use RDTSC, at least it exhibits the same problems as using RDTSC.
AMD released a fix driver that periodically synchronizes the TSCs (are those writable through MSRs? how messed up is that? >_<), and call it a "processor optimization driver", instead of labelling it as a bugfix...
On the intel machines I've tested on (haven't on my new quadcore box yet), QueryPerformanceCounter didn't seem like RDTSC timing, but more like a 1000Hz accuracy timer. PIT? APIC/whatever timer?
oh the agony! PCs are not custom fixed hardware consoles anymore... have they ever been?
and software lasts longer than hardware generations...
soon hardware will not be made upon design efficiency choices, but to best match existing codebase... core2 is just this already.
and software lasts longer than hardware generations...
soon hardware will not be made upon design efficiency choices, but to best match existing codebase... core2 is just this already.
Humm, I think core2 is more than just "best codebase match" - it seems pretty darn nice overall, and the new SSE stuff it adds certainly isn't for existing codebases :)
But OK, if we look aside x86 I think we could have a lot more efficient CPUs, but that just isn't going to happen, ever... x86-64 ruined that daydream :)
But OK, if we look aside x86 I think we could have a lot more efficient CPUs, but that just isn't going to happen, ever... x86-64 ruined that daydream :)
yes...i guess you're right...
as an aside note, do you people heard about configware? i just stumbled upon that the other night on wikipedia and it blew my mind...so much potential...
edit : going to create a topic in the heap just for that. thats what i say.
as an aside note, do you people heard about configware? i just stumbled upon that the other night on wikipedia and it blew my mind...so much potential...
edit : going to create a topic in the heap just for that. thats what i say.
So is it safe to say that real accurate timings cannot be done? :shock:
Oh, it can be done really accurately - on my Sempron, single-core, no variable clock :D. And on systems like that.
Oh, it can be done really accurately - on my Sempron, single-core, no variable clock :D. And on systems like that.
Unless you trigger SMM? ;)