; --------------------------------------------------------------------- ; These two macros perform the grunt work involved in measuring the ; processor clock cycle count for a block of code. These macros must ; be used in pairs, and the block of code must be placed in between ; the counter_begin and counter_end macro calls. The counter_end macro ; returns the clock cycle count for a single pass through the block of ; code, corrected for the test loop overhead, in EAX. ; ; These macros require a .586 or higher processor directive. ; ; If your code is using MMX instructions and not executing an EMMS ; at the end of each MMX instruction sequence, defining the symbol ; _EMMS will cause the ctr_end macro to insert an EMMS in front of ; the FPU instructions. ; ; The loopcount parameter should be set to a relatively high value to ; produce repeatable results. ; ; Note that setting the priority parameter to REALTIME_PRIORITY_CLASS ; involves some risk, as it will cause your process to preempt *all* ; other processes, including critical Windows processes. Setting the ; priority parameter to HIGH_PRIORITY_CLASS instead will significantly ; reduce the risk, and in most cases will produce the same cycle count. ; --------------------------------------------------------------------- counter_begin MACRO loopcount:REQ, priority LOCAL label IFNDEF __counter__stuff__defined__ __counter__stuff__defined__ equ <1> .data ALIGN 8 ;; Optimal alignment for QWORD __counter__qword__count__ dq 0 __counter__loop__count__ dd 0 __counter__loop__counter__ dd 0 .code ENDIF mov __counter__loop__count__, loopcount IFNB invoke GetCurrentProcess invoke SetPriorityClass, eax, priority ENDIF xor eax, eax ;; Use same CPUID input value for each call cpuid ;; Flush pipe & wait for pending ops to finish rdtsc ;; Read Time Stamp Counter push edx ;; Preserve high-order 32 bits of start count push eax ;; Preserve low-order 32 bits of start count mov __counter__loop__counter__, loopcount xor eax, eax cpuid ;; Make sure loop setup instructions finish ALIGN 16 ;; Optimal loop alignment for P6 @@: ;; Start an empty reference loop sub __counter__loop__counter__, 1 jnz @B xor eax, eax cpuid ;; Make sure loop instructions finish rdtsc ;; Read end count pop ecx ;; Recover low-order 32 bits of start count sub eax, ecx ;; Low-order 32 bits of overhead count in EAX pop ecx ;; Recover high-order 32 bits of start count sbb edx, ecx ;; High-order 32 bits of overhead count in EDX push edx ;; Preserve high-order 32 bits of overhead count push eax ;; Preserve low-order 32 bits of overhead count xor eax, eax cpuid rdtsc push edx ;; Preserve high-order 32 bits of start count push eax ;; Preserve low-order 32 bits of start count mov __counter__loop__counter__, loopcount xor eax, eax cpuid ;; Make sure loop setup instructions finish ALIGN 16 ;; Optimal loop alignment for P6 label: ;; Start test loop __counter__loop__label__ equ