Hi, i've been investigating with no luck how to get CPU Speed. I have seen many implementations, but they all use in some way a library (using sleep or QueryPerformanceCounter). I am currently working with a kernel provided by a professor, which provides me with no function at all.
For what i have analyzed, the correct way to go would be with RDTSC, but i have no idea of how to measure a fixed time interval.

Any ideas or suggestions?

EDIT: I'm coding in C and assembler btw :D
Posted on 2011-05-05 23:35:52 by cronos89
Well, your kernel will need to provide some sort of timing function, else you have no reference for RDTSC.
An alternative is to use the CPUID instruction.
You can get the CPU model identifier from that, which includes the clockspeed in most cases.
Posted on 2011-05-06 02:20:53 by Scali
Posted on 2011-05-06 10:17:32 by SpooK
Some information about CPU program may get from register  "HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\CentralProcessor\0:
Posted on 2011-05-07 18:22:31 by MikDay

Straight forward code. I would not go with the registry way though.
Posted on 2011-05-07 21:17:23 by JimmyClif

include \masm32\include\masm32rt.inc

    call CpuClockSpeed
    .IF eax
      push  eax
      push  eax
      fstp  QWORD PTR
      pop  eax
      pop  edx
      invoke crt_printf,chr$("%.2f MHz%c"),edx::eax,10

    inkey "Press any key to exit..."

; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл
; This proc determines the CPU clock speed in MHz by counting TSC
; cycles over a one-second interval timed with the high-resolution
; performance counter. If the processor supports CPUID and RDTSC
; and the system supports a high-resolution performance counter,
; the clock speed is left on the FPU stack in ST(0) and the return
; value is non-zero. Otherwise, the return value is zero.
; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл

CpuClockSpeed proc uses edi esi

    LOCAL pcFreq  :QWORD
    LOCAL pcCount :QWORD

    ; CPUID supported if can set/clear ID flag (EFLAGS bit 21).

    pop  edx
    pop  eax
    xor  eax, 200000h  ; flip ID flag
    push  eax
    pop  eax
    xor  eax, edx
    jz    fail

    ; TSC supported if CPUID function 1 returns with
    ; bit 4 of EDX set.

    mov  eax, 1
    and  edx, 10h
    jz    fail

    invoke QueryPerformanceFrequency, ADDR pcFreq
    or    eax, eax
    jz    fail
    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, HIGH_PRIORITY_CLASS

    ; Sync with performance counter and get start count.

    invoke QueryPerformanceCounter, ADDR pcCount
    mov  edi, DWORD PTR pcCount
    invoke QueryPerformanceCounter, ADDR pcCount
    cmp  edi, DWORD PTR pcCount
    je    @B

    push  edx
    push  eax

    ; Calc terminal count for 1 second delay.

    mov  edi, DWORD PTR pcCount
    mov  esi, DWORD PTR pcCount + 4
    add  edi, DWORD PTR pcFreq   
    adc  esi, DWORD PTR pcFreq + 4

    ; Loop until PC count exceeds terminal count.
    ; Cannot check low-order dword for equality
    ; because PC cannot be depended on to always
    ; increment count by one.
    invoke QueryPerformanceCounter, ADDR pcCount
    cmp  DWORD PTR pcCount+4, esi
    jne  @B
    cmp  DWORD PTR pcCount, edi
    jb    @B   

    pop  ecx
    sub  eax, ecx
    pop  ecx
    sbb  edx, ecx

    push  edx
    push  eax
    fild  QWORD PTR
    fld8  1000000.0

    invoke GetCurrentProcess
    invoke SetPriorityClass, eax, NORMAL_PRIORITY_CLASS

    return 1


    return 0

CpuClockSpeed endp

end start

Posted on 2011-05-08 08:49:37 by skywalker
The problem is that i don't have any function, such as QueryPerformanceCounter. Plus, the kernel is a VERY simple kernel. So its me that has to implement the wait or sleep function. I was thinking in using the fact that timer tick ticks 18.2 times a second... but my readings will be imprecise, because i can't measure 1 second for example.

After i have implemented a wait (or sleep) function, the rest is easy (or sort of).

Any ideas?


P.D. I have already read carefuly the wiki page of osdev but still the problem persists (mainly because of the wait/sleep function).
Posted on 2011-05-08 23:27:44 by cronos89
Why would your readings be imprecise?
If you know the timer tick is 18.2 Hz, that's all you need to know, right?
You can just do RDTSC, then wait for X ticks, and call RDTSC again. The difference between RDTSCs is Y clock cycles.
Which means you have Y cycles per X/18.2 seconds.
From there it's trivial to work out how many cycles you have per second.
Posted on 2011-05-09 04:11:36 by Scali
QueryPerformanceCounter is not always a good performance indicator specially for modern CPUs. CPUs that come with TurboBoost technology, their frequency changes depending on how busy the CPU determines to be. Another thing is with hyper-thread and multi-core CPUs doing out of order execution so multiple instructions may be executed at the same time.

Actually, in Windows itself is hard to do really accurate and precise benchmarking. The reason is due to all the background services and you don't know when the kernel will take on and context switch from your benchmarking process to another background process Windows decide to run.

Of course, there are ways to compensate to get a little more accurate and precise results. Techniques include setting your process to the highest possible priority (real-time)  so that context switch happens as little as possible. Another is to use serializing instructions like CPUID so that your CPU would not do out-of order execution.

You can try and modify Agner Fog's benchmarking code. The author has both x86 and x64, C/C++ and assembly code there; as well as for for multi-threaded programs. You can get it from this link: http://www.agner.org/optimize/testp.zip Or you can check his site for more: http://www.agner.org/optimize/

Although your results will not always be the same for every run, you can try to get an average or some more complex statistic filtering to leave out some error due to the Windows kernel context switch or whatever other factor. And benchmarking on Windows is only useful when you are comparing results between two programs that are trying to do the same thing with different algorithms. So even you don't get the exact numbers, you still can tell which program is doing better.
Posted on 2011-05-16 13:02:21 by banzemanga
QueryPerformanceCounter queries the performance counter, which is a high-precision counter that has been integrated in chipsets since the Pentium era. It's not the clockspeed, nor does it depend on clockspeed/powersaving.

Anyway, it doesn't apply to him, since he doesn't use Windows.
Posted on 2011-05-16 18:58:18 by Scali
Yes, you are right Scali.

QueryPerformanceFrequency is the one that tells us the CPU frequency. Since cronos89 mentioned QueryPerformanceCounter, i was assuming he was doing benchmarking to check on algorithm bottlenecks.

Last time i used QueryPerformanceCounter, it gave me way too inconsistent results every run. So that is why i went looking around for assembly solutions for benchmarking and that is how i found Agner Fog's site; which game me results with little error compared to QueryPerformanceCounter.

Edit: Let me see if i get it right. What cronos89 is really trying to do is to check CPU's busy time.

Edit2: Sorry. I keep babbling unrelated stuff. So, the task is that assuming that we don't know the CPU speed, we are writing a piece of code for our own OS to determine the CPU speed.
Posted on 2011-05-17 12:48:42 by banzemanga
QueryPerformanceFrequency is the one that tells us the CPU frequency.

No it doesn't. Have you ever tried calling it?
QPF gives you the frequency of the performance counter, not of the CPU.
On my PC it reports something like 135 MHz (my CPU is 3 GHz). The frequency may vary from one chipset to the next, but it is NOT the CPU frequency. The performance counter is a special counter in the chipset, as I said (the CPU has its own cycle counter, known as the TimeStamp Counter, which you can read with RDTSC. Together with another timer, you can work out how many cycles there are in a second. QPC/QPF can help you there).

QPC/QPF can have very poor results if you don't have the proper chipset drivers installed. As it is a chipset function, the 'legacy' chipset drivers that come with Windows will not use the hardware properly, and go into some kind of emulation. With the proper drivers for your chipset installed, the counter should have much better stability (although there can still be glitches for various reasons).
Posted on 2011-05-17 13:46:06 by Scali
Hmmm... I just tested it right now and it works just like i described.
QPF reports me 1688Mhz which is close enough to my CPU actual speed of 1.7Ghz.

I always have my drivers updated to the latest so i believe that should have never been the problem.
You can actually check around and you find that what i described is true.
QPC indeed does give me crazy results in every run in many modern processors.

I remember it started when i tried my first code on my Core 2 Duo laptop and the results were different in every run.
Until to ran the program on my old custom AMD (single core) processor tower and the results were always the same.
And then started googling the reason for it and that is why i went after the assembly implementation of it.
I never had a multi-core AMD processor so i can't say about that though.

Edit: Tried the same program on my Core 2 Duo laptop and QPF reports me 1.46Ghz of the marketed speed 1.5Ghz.
This means that either the value gotten from QPF has some errors or the marketed value is being rounded off.
Can't try the program on the old tower yet since i need to fix it first.
Posted on 2011-05-17 19:15:18 by banzemanga
banzemanga: It is not true.
Read MSDN: http://msdn.microsoft.com/en-us/library/ms644905(v=vs.85).aspx
Nowhere does it say that it is the CPU clockspeed.
It also specifically says "The frequency cannot change while the system is running."
In fact, read this bit: http://msdn.microsoft.com/en-us/library/ee417693%28VS.85%29.aspx
1.Use QueryPerformanceCounter and QueryPerformanceFrequency instead of RDTSC. These APIs may make use of RDTSC, but might instead make use of a timing devices on the motherboard or some other system services that provide high-quality high-resolution timing information.

Bottom line is: QPF is NOT the correct way to check for CPU frequency. You may not assume that the performance counter is the TSC. It could be, but then again it could not be.
Posted on 2011-05-18 04:44:38 by Scali
The article was made back in 2005. I have to say that it needs some updating.

First, write a small program using QPF and try it on different machines.
You will see that it does indeed get the frequency in its own way.

I have checked msdn' site too before writing my QPF code.
To tell you the truth, the msdn article just reminds me how lousy their documentation can be at times.
There have been many times where i had to check from an outside source to get an answer about Windows API.

Here is the line right after your quote.
While RDTSC is much faster than QueryPerformanceCounter, since the latter is an API call, it is an API that can be called several hundred times per frame without any noticeable impact.

But as your benchmarking code gets large, the accuracy gets thrown out of the window.

All of the sources i could get you about how inaccurate QPC is, are outside microsoft.
Since you won't believe anything but what Microsoft says i would like to stop but here are some sources.

QPF is indeed the CPU frequency. How do i prove it? How do you get the time elapsed by your program using QPF and QPC?
By this formula:
TimeElapsed = QPC/QPF = Ticks/TicksPerSecond= constant/frequency = constant/(1/second) = constant*second

Of course, QPF/QPC is still better than Windows' TSC. I never said they are the same thing.
However, just like QPF/QPC is better than TSC, RDTSC is still better than QPF/QPC.
I remember doing that comparison between QPF/QPC and RDTSC when i modified Agner Fog's code for my own use.

Of course, QPF/QPC are not everything to blame. Like i said from my first post, Windows itself is part of the problem.
Even RDTSC give me inconsistencies which Agner Fog tone them down by serializing instructions and setting the process priority to the highest.
However, if you do two benchmarking code; one QPF/QPC and another using RDTSC, you find that one is more accurate than the other.

QPF is NOT the correct way to check for CPU frequency.

I agree with you. But it is not bad when you need a quick way to do it right?
Posted on 2011-05-18 15:44:49 by banzemanga
Scali is correct that QPF isn't (not even close!) the correct way to check CPU frequency.

On some systems, the "performance counter" will basically up being RDTSC, and (assuming no throttling/boost) then yeah - on that system, you get the CPU frequency.

On other systems, it's totally separate timers (APIC timer? Been a while since I checked up on it). I've seen systems reporting 1000Hz frequency, and obviously not on a 1KHz CPU ;)
Posted on 2011-05-18 17:25:09 by f0dder

The article was made back in 2005. I have to say that it needs some updating.

It doesn't need updating, the QPC/QPF API hasn't changed.

First, write a small program using QPF and try it on different machines.
You will see that it does indeed get the frequency in its own way.

As I said, sometimes it does, sometimes it doesn't.
This is a logical fallacy. You assume that QPF gives you the clockspeed, because you see that its result gives you the clockspeed.
The API spec says something different. You should take the API spec as the truth, and apply logic to that, rather than taking your interpretation of what the API does, based on your limited observations, and making generalizations which are partly in conflict with what the API specs say.

To tell you the truth, the msdn article just reminds me how lousy their documentation can be at times.
There have been many times where i had to check from an outside source to get an answer about Windows API.

The Windows API is arguably the best-documented API on the planet. I see nothing wrong with the QPC/QPF documentation either. They are very clear about what it does, how it differs from RDTSC and when you should or should not use it.

But as your benchmarking code gets large, the accuracy gets thrown out of the window.

It's meant to be used as a timer, not as a cycle counter. RDTSC is meant to count cycles (although Intel more or less threw that out of the window when they decoupled the TSC from the actual clockspeed).

Just give it up, I am not impresed by Microsoft-bashing.

I agree with you. But it is not bad when you need a quick way to do it right?

Yes it is. You cannot assume that QPF returns the clockspeed. As MSDN says, it might use another hardware timer instead, located on the motherboard (as I said, one of my machines always returns 135 MHz, that's a chipset timer, not the clockspeed, it's nowhere near the clockspeed). NEVER USE QPF TO GET THE CLOCKSPEED.
In Windows, a better quick-and-dirty way is to read it from the registry: HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\~MHz
This key is updated at every boot.

The accuracy (or lack thereof) of QPC when performing timing is another matter altogether, and doesn't relate directly to obtaining the clockspeed.
Posted on 2011-05-18 17:27:49 by Scali
Ok... I see where my fallacy is. Thanks.

You are saying that QPF might or might not use CPU's frequency.
Whereas, i was assuming will always use CPU's frequency as the timer when it might use another's device at certain times.

Then here is a question, why QPF/QPC should be used as a timer for algorithm bottlenecks when it doesn't directly relate to the CPU performance?
Posted on 2011-05-18 19:07:24 by banzemanga

Oh my, this again :)

Short answer : before throttling/multicore, yes, one could argue rdtsc was the ultimate timer as well as cycle counter, if you do it yoursef correctly.
Before this you could program the 8053 or cmos rtc to generate a handgul of irqs per sec, then maybe the pentium apic internal timers. Then rdtsc was the silver bullet.

Then again:
Multicore made rdtsc potentially buggy when the process is robbed from one core to the other. Then frequency throttling made it fuzzy and questionably trustful. Then now there may be even more advanced execution sophistication that interfere.

Thats why QPC's goal is to provide a standardized way of having a decent timer. On modern systems it's arguably the most robust way. May not be enough to count cycles. I thought it was ok to assume it was at least dozens of MHz but given the above contributions it seems not.
Posted on 2011-05-18 22:45:44 by HeLLoWorld
I'm not sure you understand that QPC/QPF has one and only one use:

Getting to know how much realworld time has passed between one point in your code and another.

All use cases derive from this one.

Frequency of QPF is only relevant for knowing what best precision you *might* get from QPC.
Posted on 2011-05-18 22:50:24 by HeLLoWorld