You are saying that QPF might or might not use CPU's frequency.

Specifically, it might be one of the following:
- RDTSC - mostly on low- to -mid-end AMD systems from my experience
- 8254 PIT with a resolution of about 1 ms. It's set up via a I/O port, so it's kinda slow to use. If you see that your QPF/QPC is noticably slower than on other system, you most probably have 8254 PIT.
- APIC timer - good, fast, synchronized across all cores. May pause during power-saving modes so it can't be used to reliably measure time.
- PM Clock - an upgraded APIC timer. It doesn't pause during power-saving modes.
- HPET - "the standard" in all modern systems. ~20% faster to use than PM clock.

Older windows versions (XP, specifically) need a /usepmtimer switch to make them use PM instead of legacy sh*t. Modern windows versions use PM timer or HPET, whichever is available (64-bit HPET is available on all modern motherboards. make sure to enable it in the bios config).

Getting to know how much realworld time has passed between one point in your code and another.

True, but even that isn't reliable in some scenarios. I prefer timeGetTime to get a reliable, even if less precise, timing. It has a resolution of 1 ms and precision of about 5ms, so it's OK in most situations.
Posted on 2011-05-19 02:46:12 by ti_mo_n

You are saying that QPF might or might not use CPU's frequency.
Whereas, i was assuming will always use CPU's frequency as the timer when it might use another's device at certain times.


Yes, I said that literally a few times, and also quoted MSDN saying exactly the same. Not quite sure how you could have missed it.


Then here is a question, why QPF/QPC should be used as a timer for algorithm bottlenecks when it doesn't directly relate to the CPU performance?


See HeLLoWoRLD's answer. QueryPerformanceCounter/Frequency only guarantee that you are querying a 'high-performance counter'. Basically the most high-resolution counter available in the system. They don't make any guarantees about WHAT counter this is, or how high the resolution actually is. But since you can query the frequency, they don't have to.
You can still use it to time your code... Even GetTickCount() can be used for that. Just run the code in a loop enough times to compensate for the lack of resolution of the timer. You just can't time it in exact cycles... But that's what rdtsc is for.
Posted on 2011-05-19 02:56:35 by Scali

True, but even that isn't reliable in some scenarios. I prefer timeGetTime to get a reliable, even if less precise, timing. It has a resolution of 1 ms and precision of about 5ms, so it's OK in most situations.


I do QPF, and use it if I see a reasonable frequency reported, and fall back to timeGetTime when QPF seems suspect.
Posted on 2011-05-19 03:08:15 by Scali

Then again:
Multicore made rdtsc potentially buggy when the process is robbed from one core to the other. Then frequency throttling made it fuzzy and questionably trustful. Then now there may be even more advanced execution sophistication that interfere.


I'd like to add that especially Intel fumbled the TSC in recent years so that it no longer does what it is supposed to do, but it makes all the broken software in the world run.
Technically, the TSC was supposed to be updated at every cycle of the core. With multicore, yes, it would be rather obvious that each core could have a different TSC value (they may not have been initialized at the exact same time, and in recent years, with CPU throttling and power save etc, not all cores may run at the same clock speed all the time... Which shouldn't matter, if you view a multicore CPU as a multi-CPU system on-a-chip. If you have physically different CPUs in a system, clearly they will also each have their own TSC, so you need to be careful with which CPU you are using at the time).
However, since apparently most programmers don't understand how to write proper multithreaded code, this messed up most programs using TSC as an actual timer.

Intel decided to make the TSC a global counter for multicore CPUs. So on an Intel CPU, calling RDTSC will yield the same value, regardless of what core you are using. They also decided to make the TSC run at a fixed speed (not all cores may be running at the same speed at all times anyway, so in a way it makes sense... otherwise, which core's clockspeed would you take?).
However, AMD apparently forgot to read the Intel specs, so AMD's multicore CPUs were not compatible with Intel in this respect. Despite everyone claiming that AMD had native multicore CPUs while Intel didn't, in reality AMD just copy-pasted the logic of multiple single-core Athlons onto a single die. The result: they still had individual TSCs, which were not synced, unlike Intel's.

AMD decided to release a software fix (which I believe will just re-sync the TSCs of all cores every once in a while) so that their CPUs now had the same behaviour as Intel again, and all the broken software would work. Bonus points for the marketing of this fix: They called it the Dual-Core Optimizer(tm):
http://support.amd.com/us/Pages/dynamicDetails.aspx?ListID=c5cd2c08-1432-4756-aafa-4d9dc646342f&ItemID=153
AMD always 'optimizes' stuff when their hardware is flawed. Just like they know have a 'tessellation optimizer' in their Radeon drivers, which artificially reduces tessellation workloads, since their hardware can't handle high amplification counts, teehee.

So, everyone is happy again, since Intel/AMD now basically turned RDTSC into what QPC has been doing all the time: a high-performance fixed speed timer. So all the broken software will now work, teehee!
Well, except for actual multi-CPU systems. I don't think the CPUs will sync their TSCs.
Posted on 2011-05-19 03:45:39 by Scali
which I believe will just re-sync the TSCs of all cores every once in a while


Hmm. That cant work.
One day or another a process will jump cores at the exact wrong time and everything will blow.

ti_mo_n :
Damn. I dont know where you get your information from but your contribution shows how valuable a few knowledgeable words can be ; I didnt even think it was possible to get that information. But then again silly me, there must be someone somewhere who writes the windows librairies ; I don't know if thats you (or if you disassembled ).
This forum doesnt deceive me :)
Posted on 2011-05-19 11:55:56 by HeLLoWorld
Besides that, 5ms uncertainity is not sufficient for the only thing that matters to everyone in the world:

Timing part or all of a framebuffer generation.

Posted on 2011-05-19 12:01:17 by HeLLoWorld

which I believe will just re-sync the TSCs of all cores every once in a while


Hmm. That cant work.
One day or another a process will jump cores at the exact wrong time and everything will blow.


No, it's not perfect. But that's AMD for ya (just like their large page addressing bug in the original Athlon... never fixed, just a less-than-perfect software workaround: disable large pages in the Windows registry: http://support.microsoft.com/kb/270715).
It should be good enough though, things don't drift too far on short notice. Besides, it's mostly for games and other high-performance apps, where generally all cores are running at max speed all the time, so that's not the most problematic scenario.
Posted on 2011-05-19 12:04:53 by Scali
Let's say the counters are synchronized, get out of sync because one throttles a bit, then before the next sync some code does two rdtsc on two cores because of a task switch, gets a negative difference and blows up.


When talking about synchronization, deadlocks and impossible one-in-a-million scenarios, a teacher once told us stories about people that win huge lotteries with their first tickets.

There are gazillions of processors executing gazillions of instructions each second, for years and years.

Theres no such thing as luck ; shit hapens and things will blow. :)

Posted on 2011-05-19 12:57:54 by HeLLoWorld

Let's say the counters are synchronized, get out of sync because one throttles a bit, then before the next sync some code does two rdtsc on two cores because of a task switch, gets a negative difference and blows up.


Sure, but as I say, in games, the CPU will run at full speed, and at the time AMD didn't have throttling for overheating either (in fact, I don't know if they do now).
So it's 'good enough' in most cases.

The real problem is not the CPU though, it's the software (although obviously it's nasty when AMD makes Intel clones that don't do exactly what the Intels do).
The software should either not rely on RDTSC directly, or be smarter about how they use it. If you set thread affinity, then at least you avoid the core jumping problem. And you could probably avoid cores clocking down by making your own idle threads, just to be safe. Or just tell the user to disable Cool-n-Quiet.
Posted on 2011-05-19 13:06:07 by Scali
I believe scali is correct wrt. how AMD "fixed" their TSC differences - I used to own an AMD64x2. Before AMD released the "fix", I used to have to manually limit thread affinity of some games to a single CPU because the stupid game designers thought it was a good idea to base their gameloop timing around RDTSC... and I kept doing that, since I didn't feel like having an app constantly running the needed ring0 privileges in order to WRMSR.
Posted on 2011-05-20 14:15:47 by f0dder
Well, the description on the site actually says that too:
The AMD Dual-Core Optimizer can help improve some PC gaming video performance by compensating for those applications that bypass the Windows API for timing by directly using the RDTSC (Read Time Stamp Counter) instruction. Applications that rely on RDTSC do not benefit from the logic in the operating system to properly account for the affect of power management mechanisms on the rate at which a processor core's Time Stamp Counter (TSC) is incremented. The AMD Dual-Core Optimizer helps to correct the resulting video performance effects or other incorrect timing effects that these applications may experience on dual-core processor systems, by periodically adjusting the core time-stamp-counters, so that they are synchronized.


It doesn't say how large this 'period' is, but I doubt they do it at EVERY context switch.
At the same time it also implies that 'bypassing the Windows API' (being QPC/QPF) for timing logic and using RDTSC directly is a bad thing, since the Windows API has code to avoid problems with core switching, throttling and all that.
Posted on 2011-05-20 15:01:13 by Scali
In general QPC is useless, the results are unreliable as the switch from protected mode in order to execute RDPMC has so much overhead that RDTSC is probably more accurate. No idea what the huge security issue was in making this a privileged instruction but for whatever reason you cannot execute it from user mode. Would be nice if MS/Intel would give it to us though, a low granularity timer doesn't seem to threaten anything except Intel's extremely expensive vTune analyzer.
Posted on 2011-05-24 01:04:50 by donkey
I'm struggling to think of a case where timing in video games is so critical that it requires anything like RDTSC.
Posted on 2011-05-24 06:06:14 by Homer

I'm struggling to think of a case where timing in video games is so critical that it requires anything like RDTSC.


In video games maybe not, but in performance analyzers high resolution timers that can be accessed quickly are important.
Posted on 2011-05-24 08:01:58 by donkey