Just take it that i forgot abt entry point :)


.386
.model flat, stdcall
option casemap:none

include /masm32/include/windows.inc
include /masm32/include/kernel32.inc
include /masm32/include/user32.inc
includelib /masm32/lib/kernel32.lib
includelib /masm32/lib/user32.lib
.data
format db "%d %d clocks",0
.data?
buffer db 64 dup (?)
hOutput dd ?
written dd ?

.code
start:
invoke GetStdHandle, STD_OUTPUT_HANDLE
mov hOutput, eax
rdtsc ; First measure of time
; rdtsc takes 13 cycles on Pentium MMX,
push edx ; I store it on the stack.
push eax ; You can store the 64 bit number whereever you like

; routine to test
nop
nop
nop
nop
; end routine

rdtsc ; Second measure of time
sub eax, [esp] ; subtract first from second
sbb edx, [esp+4] ; result in EDX:EAX

; sub eax, 0eh ; Optional compensation for the rdtsc and 2 pushes
; sbb edx, 0 ; 14 cycles on a Pentium MMX
; 9 cycles on a K6-2

add esp,8 ; remove edx, eax from stack
invoke wsprintf, offset buffer, offset format, edx,eax
invoke WriteConsole,hOutput, offset buffer, sizeof buffer,offset written,0
end start
Posted on 2003-04-30 07:25:40 by roticv
Well, yes. For GetTickCount only, the OS could rdtsc at OS start (I think the "since the system was started" means windows start, not CPU poweron - might be wrong). And it determines the CPU speed, so it could convert clock cycles to miliseconds.

That doesn't address the problem with power saving modes, though.

The PIT is already used anyway, probably with a higher-than-milisec resolution, so it can add to a counter, which can later be returned much in the same manner that you'd convert a rdtsc reading to miliseconds.

my "constantly polling RDTSC would be slow." is nonsense - I was thinking preemptive scheduling, but of course you can't do that without an interrupt timer, thus there's no way "RDTSC polling" would enter the picture. :stupid:
Posted on 2003-04-30 07:33:53 by f0dder
V Coder wrote
I wonder... Wouldn't the OS routine that determines ticks only poll RDTSC when it is called, not constantly?


f0dder,

Acutally, I was thinking maybe Bill could come and tell us exactly how GetTickCount works. I'm realizing that if it used RDTSC it would need to measure the clock speed first (either at the boot up, in which case the figure is stored somewhere) or else in the GetTickCount routine, in which case it would add lots of overhead. I'm just speculating here anyway. But how do you measure clock speed anyway. Wait, let me check yodel...

Enjoy.


roticv,

Thanks. Well now I'm late for work... I'll check it when I return.
Posted on 2003-04-30 07:43:44 by V Coder
V,

===============================
hutch,

I agree that there is tremendous variability with the time offered by RDTSC, especially for routines that should take lets say 200 cycles... I am getting wide variation.... Still in testing phase now though... I hope to get a fix on the ideal number of iterations to mask that effect.
===============================

This is pretty much what I found using it and even though its easy enough to use as an instruction, the results varied too much.

I use the simple GetTickCount with large samples because its easy to use and can deliver the accuracy I require when used that way.

With RDTSC, there used to be a trick after you set the priority higher which was to use a serialising instruction which was from memory CPUID before it to stall both pipelines and empty them out.

Logic is like this.

CPUID
CPUID

RDTSC

run the code to time

RDTSC

The compare the results of the first and second RDTSC for the timing.

I think if you algo is small enough that the technique that Bitrake designed is a good one if you run it often enough and only ake the minimum values. This is a good technique in the development stage but I would still suggest doing your final evaluation in real time as it better emulates how it will work in a program.

Something I found out years ago with my old PIII was that if i had a net connection running which used CPU cycles if effected the timings I got in an interesting way. Algorithms that depended on loop code suffered the most from the interference where algorithms that depended on better logic has less interference.

Regards,

hutch@movsd.com
Posted on 2003-04-30 08:02:00 by hutch--
hutch, a possible explanation for the net connection stuff is that most decent NICs will issue a hardware interrupt when there's data arriving, and when it's ready to send; thus, context switching is hard to avoid, even if you raise process priority to realtime-ish.
Posted on 2003-04-30 08:06:46 by f0dder
Are you folks up to speed on the ins and outs of the RDTSC instruction as described by this site? Ratch
http://cedar.intel.com/software/idap/media/pdf/rdtscpm1.pdf
Posted on 2003-05-01 19:27:08 by Ratch
Nice doc, ratch.
Posted on 2003-05-02 02:01:19 by f0dder