Hi all,
Being interested in profiling I searched the board for it, and amongst others (like VTune and CodeAnalist) I came across Maverick's PROFILEr. He also released version 2.0 of it.
In the link below (dated 17/18 september 2002, so it's an old thread) there is a statement by Maverick that Profile v2.0 had some secondary issues to be fixed at that time.
http://www.asmcommunity.net/board/index.php?topic=7996.0
His statement was:
I couldn't find the answers on the following questions:
- The secondary issues of v2: Have they been dealt with since?
- Can version 2 be used safely at the moment?
- If not, can one still use the older version? Are there any drawbacks?
- As I saw in the threads regarding the older version of PROFILE: There were problems translating the NASM version into Masm. What is the most reliable MASM translation at the moment? (I hope it is still being used)
Can anyone shape some light in the darkness?
Friendly regards,
mdevries.
Being interested in profiling I searched the board for it, and amongst others (like VTune and CodeAnalist) I came across Maverick's PROFILEr. He also released version 2.0 of it.
In the link below (dated 17/18 september 2002, so it's an old thread) there is a statement by Maverick that Profile v2.0 had some secondary issues to be fixed at that time.
http://www.asmcommunity.net/board/index.php?topic=7996.0
His statement was:
I still haven't had the time (and expecially the attitude) to fix some secondary issues of v2.. So people may prefer to use the older one.
I couldn't find the answers on the following questions:
- The secondary issues of v2: Have they been dealt with since?
- Can version 2 be used safely at the moment?
- If not, can one still use the older version? Are there any drawbacks?
- As I saw in the threads regarding the older version of PROFILE: There were problems translating the NASM version into Masm. What is the most reliable MASM translation at the moment? (I hope it is still being used)
Can anyone shape some light in the darkness?
Friendly regards,
mdevries.
Hi,
Hi all,
Being interested in profiling I searched the board for it, and amongst others (like VTune and CodeAnalist) I came across Maverick's PROFILEr. He also released version 2.0 of it.
In the link below (dated 17/18 september 2002, so it's an old thread) there is a statement by Maverick that Profile v2.0 had some secondary issues to be fixed at that time.
http://www.asmcommunity.net/board/index.php?topic=7996.0
His statement was:
I still haven't had the time (and expecially the attitude) to fix some secondary issues of v2.. So people may prefer to use the older one.
I couldn't find the answers on the following questions:
- The secondary issues of v2: Have they been dealt with since?
It's been a very long time. No, I don't think I worked more on it, sorry.
- Can version 2 be used safely at the moment?
If you test it on your system (CPUs that didn't exist at the time) and it appears to work correctly (expecially not giving negative results when you profile nothing or maybe a NOP) then I assume it's safe.
- If not, can one still use the older version? Are there any drawbacks?
Same as above I think.
- As I saw in the threads regarding the older version of PROFILE: There were problems translating the NASM version into Masm. What is the most reliable MASM translation at the moment? (I hope it is still being used)
Sorry, I've never been a MASM user, and I'm not a NASM user anymore (FASM rocks anytime!).
Can anyone shape some light in the darkness?
I am very sorry I cannot shed more light than that. It's been a long time and a lot of bits passed through my systems since then. I haven't really done much x86 asm since those times by the way.. I moved to other CPUs to get much more fun.
Friendly regards,
mdevries.
Greets,
Maverick
PS: please do not hate me :)
Thanks for stepping by and answering, anyway :)
Are any/all of the versions available ?
Multiple searches revealed posts without any attachments.
Such as:
http://www.asmcommunity.net/board/index.php?topic=7510
Multiple searches revealed posts without any attachments.
Such as:
http://www.asmcommunity.net/board/index.php?topic=7510
Hi,
Thanks Maverick for your response.
As you suggested I just tried both versions.
I profiled 2 types of functions with the older version of PROFILE.
- a function without parameters
- a function with 1 parameter
The first time the only thing the function contained was a RET.
The next time a NOP and a RET.
Then 2 NOP's and a RET,
Then 3 NOP's and a RET, etc.
After having done this I profiled the same functions with the newer version of PROFILE. These were the results on my Intel Pentium M notebook, running under Windows XP:
There was a stack unbalance after profiling the function with the parameter.
ESP before profiling: 0012FFC4h
ESP after profiling: 00130000h
So, there was a difference of 60 bytes, too much popped off the stack.
The stack unbalance doesn't appear after profiling the function without parameters.
This is the code I found and used for the older version of PROFILE. I hope this was the definite MASM version, that solved the alignment problems.
This is my little test function:
Note that PROFILE_ALIGN_CODE and PROFILE_ENDS_CODE are NaN's macros to start and end a page aligned new code segment.
I used the test function for both the older and the newer version of PROFILE.
This is the way I called the test function, using the older version of PROFILE:
I am using BiteRider's ObjAsm32's debug system (DebugCenter) to show the results.
Now the code for PROFILE v2.0, partly borrowed from MArtial_Code (January 18, 2003):
This is an include file that includes the binary code from the Profile.bin file.
The binary code was supplied by Maverick to make PROFILE v2.0 assembler independent:
This is the way I called my test function, using PROFILE v2.0:
If I did everything right, my conclusion is:
Both the older and the newer versions of PROFILE don't produce the correct amount of cycles anymore, on nowadays processors.
But I hope I did something wrong, for that would mean, my conclusion is wrong, and that could mean: PROFILE is still alive!!
If some of you are still using PROFILE succesfully on the newer processors, or if you can find something wrong in the code, please respond. For I would like PROFILE to be doing it's job again.
Friendly regards,
mdevries.
Thanks Maverick for your response.
If you test it on your system (CPUs that didn't exist at the time) and it appears to work correctly (expecially not giving negative results when you profile nothing or maybe a NOP) then I assume it's safe.
As you suggested I just tried both versions.
I profiled 2 types of functions with the older version of PROFILE.
- a function without parameters
- a function with 1 parameter
The first time the only thing the function contained was a RET.
The next time a NOP and a RET.
Then 2 NOP's and a RET,
Then 3 NOP's and a RET, etc.
After having done this I profiled the same functions with the newer version of PROFILE. These were the results on my Intel Pentium M notebook, running under Windows XP:
PROFILE PROFILE v2.0
0 Param 1 Param 0 Param 1 Param
0 nop ret 0 -1 0 0
1 nop ret -4 2 0 -3
2 nop ret -4 1 0 2
3 nop ret -3 -2 0 1
4 nop ret -5 4 1 -2
5 nop ret 0 0 0 -3
6 nop ret -2 -2 0 2
7 nop ret -8 0 2 4
8 nop ret -4 -2 2 3
9 nop ret -2 4 1 6
10 nop ret -1 3 -3 6
11 nop ret -1 2 -2 4
12 nop ret -5 5 3 3
13 nop ret -6 7 -1 3
14 nop ret -1 7 -1 5
15 nop ret -4 7 4 5
16 nop ret 1 6 4 7
17 nop ret -4 5 5 8
18 nop ret 4 7 5 9
19 nop ret 2 6 5 8
20 nop ret 3 7 7 6
21 nop ret 3 10 2 8
22 nop ret 3 8 7 11
23 nop ret 4 7 8 12
24 nop ret 4 15 7 12
There was a stack unbalance after profiling the function with the parameter.
ESP before profiling: 0012FFC4h
ESP after profiling: 00130000h
So, there was a difference of 60 bytes, too much popped off the stack.
The stack unbalance doesn't appear after profiling the function without parameters.
This is the code I found and used for the older version of PROFILE. I hope this was the definite MASM version, that solved the alignment problems.
.686p
; ---------------------------------------------------------------------------
PROFILE MACRO ROUTINE
MOV ,ROUTINE
CALL _PROFILE
ENDM
; ---------------------------------------------------------------------------
PROFILE_ALIGN_CODE MACRO
PROFILE_ENDS_CODE MACRO
_TEXT$1 ENDS
ENDM
_TEXT$1 SEGMENT PAGE READONLY PUBLIC
ENDM
; ---------------------------------------------------------------------------
PROFILE_ALIGN_BSS MACRO
PROFILE_ENDS_BSS MACRO
_BSS$1 ENDS
ENDM
_BSS$1 SEGMENT PAGE PUBLIC
ENDM
; ---------------------------------------------------------------------------
PROFILE_ALIGN_DATA MACRO
PROFILE_ENDS_DATA MACRO
_DATA$1 ENDS
ENDM
_DATA$1 SEGMENT PAGE PUBLIC
ENDM
; ---------------------------------------------------------------------------
; -------------------------------------
PROFILE_ALIGN_BSS ; -------------------------------------------------------
ALIGN 64 ; align to a cache entry on all CPU's
PROFILEROUTINE DWORD ?
PROFILECYCLES DWORD ?
DWORD ?
_PROFILEEMPTY DWORD ?
DWORD ?
_PROFILERETURN DWORD ?
_PROFILEINEAX DWORD ?
_PROFILEINEBX DWORD ?
_PROFILEINECX DWORD ?
_PROFILEINEDX DWORD ?
_PROFILEINESI DWORD ?
_PROFILEINEDI DWORD ?
_PROFILEINEBP DWORD ?
_PROFILEINEFL DWORD ?
_PROFILEOUTEAX DWORD ?
_PROFILEOUTEBX DWORD ?
_PROFILEOUTECX DWORD ?
_PROFILEOUTEDX DWORD ?
_PROFILEOUTEFL DWORD ?
_PROFILERETADDR DWORD ?
PROFILE_ENDS_BSS ; -------------------------------------------------------
; -------------------------------------
PROFILE_ALIGN_CODE ; -------------------------------------------------------
ALIGN 64 ;16 ; align to a cache entry on all CPU's
_PROFILE:
MOV [_PROFILEINEAX],EAX ; saves INPUT EAX (will be trashed by CPUID)
MOV [_PROFILEINEBX],EBX ; saves INPUT EBX (will be trashed by CPUID)
MOV [_PROFILEINECX],ECX ; saves INPUT ECX (will be trashed by CPUID)
MOV [_PROFILEINEDX],EDX ; saves INPUT EDX (will be trashed by CPUID)
MOV [_PROFILEINESI],ESI ; saves INPUT ESI (will be trashed by the routine to be tested, which will be called multiple times)
MOV [_PROFILEINEDI],EDI ; saves INPUT EDE (will be trashed by the routine to be tested, which will be called multiple times)
MOV [_PROFILEINEBP],EBP ; saves INPUT EBP (will be trashed by the routine to be tested, which will be called multiple times)
PUSHFD
POP [_PROFILEINEFL] ; saves INPUT CPU EFLAGS
POP [_PROFILERETURN] ; saves return address
PUSH ; saves requested PROFILEROUTINE
MOV ,@empty ; first we'll profile a simple RET
MOV [_PROFILERETADDR],@ret1
JMP @profile ; make sure it gets cached
@ret1: MOV [_PROFILERETADDR],@ret2
JMP @profile ; profile for real (well, let it set up)
@ret2: MOV [_PROFILERETADDR],@ret3
JMP @profile ; profile for real (well, let it set up again)
@ret3: MOV [_PROFILERETADDR],@ret4
JMP @profile ; profile for real (well, let it set up one final time)
@ret4: MOV [_PROFILERETADDR],@ret5
JMP @profile ; profile for real
@ret5: MOV EAX,
MOV EDX,
MOV [_PROFILEEMPTY+0],EAX ; saves RET cycles count
MOV [_PROFILEEMPTY+4],EDX
;
POP ; restores requested PROFILEROUTINE
MOV [_PROFILERETADDR],@ret6
JMP @profile ; make sure it gets cached
@ret6: MOV [_PROFILERETADDR],@ret7
JMP @profile ; profile for real
@ret7: MOV [_PROFILERETADDR],@ret8
JMP @profile ; make sure it gets cached
@ret8: MOV [_PROFILERETADDR],@ret9
JMP @profile ; profile for real
@ret9: MOV [_PROFILERETADDR],@ret10
JMP @profile ; make sure it gets cached
@ret10: MOV EAX,[_PROFILEEMPTY+0] ; subtracts simple RET overhead
MOV EDX,[_PROFILEEMPTY+4]
SUB ,EAX ; saves cycles, low 32bit
SBB ,EDX ; saves cycles, high 32bit
MOV EAX,[_PROFILEOUTEAX] ; gives OUTPUT EAX
MOV EBX,[_PROFILEOUTEBX] ; gives OUTPUT EBX
MOV ECX,[_PROFILEOUTECX] ; gives OUTPUT ECX
MOV EDX,[_PROFILEOUTEDX] ; gives OUTPUT EDX
PUSH [_PROFILEOUTEFL]
POPFD ; gives CPU EFLAGS
JMP [_PROFILERETURN] ; returns to caller
@profile:
;WBINVD (if you want to test uncached data/code) ; beware, WBINVD is only for ring0
MOV EAX, ; touches caches
MOV EDX,
MOV EAX,[_PROFILEINEAX]
MOV EBX,[_PROFILEINEBX]
MOV ECX,[_PROFILEINECX]
MOV EDX,[_PROFILEINEDX]
MOV ESI,[_PROFILEINESI]
MOV EDI,[_PROFILEINEDI]
MOV EBP,[_PROFILEINEBP]
MOV EAX,[_PROFILEINEFL]
MOV EAX,[_PROFILEOUTEAX]
MOV EBX,[_PROFILEOUTEBX]
MOV ECX,[_PROFILEOUTECX]
MOV EDX,[_PROFILEOUTEDX]
MOV EAX,[_PROFILEOUTEFL]
MOV EAX,
MOV ECX,32
@@stack: PUSH EAX ; touches stack
LOOP @@stack
ADD esp,128
XOR EAX,EAX
CPUID ; flush pipelines
RDTSC
MOV ,EAX ; saves TSC, low 32bit
MOV ,EDX ; saves TSC, high 32bit
XOR EAX,EAX
CPUID ; flush pipelines
MOV EAX,[_PROFILEINEAX] ; restores INPUT EAX
MOV EBX,[_PROFILEINEBX] ; restores INPUT EBX
MOV ECX,[_PROFILEINECX] ; restores INPUT ECX
MOV EDX,[_PROFILEINEDX] ; restores INPUT EDX
MOV ESI,[_PROFILEINESI]
MOV EDI,[_PROFILEINEDI]
MOV EBP,[_PROFILEINEBP]
PUSH [_PROFILEINEFL]
POPFD ; restores CPU EFLAGS
CALL ; calls routine to be tested
MOV [_PROFILEOUTEAX],EAX ; saves OUTPUT EAX
MOV [_PROFILEOUTEBX],EBX ; saves OUTPUT EBX
MOV [_PROFILEOUTECX],ECX ; saves OUTPUT ECX
MOV [_PROFILEOUTEDX],EDX ; saves OUTPUT EDX
PUSHFD
POP [_PROFILEOUTEFL] ; saves OUTPUT CPU EFLAGS
XOR EAX,EAX
CPUID ; flush pipelines
RDTSC
XCHG ,EAX
XCHG ,EDX
SUB ,EAX ; saves TSC, low 32bit
SBB ,EDX ; saves TSC, high 32bit
JMP [_PROFILERETADDR]
ALIGN 64 ; 16 ; align to a cache entry on all CPU's
@empty: RET
PROFILE_ENDS_CODE ; -------------------------------------------------------
This is my little test function:
Note that PROFILE_ALIGN_CODE and PROFILE_ENDS_CODE are NaN's macros to start and end a page aligned new code segment.
I used the test function for both the older and the newer version of PROFILE.
PROFILE_ALIGN_CODE
align 64
testfunc proc ;Param1:dword ;Uncomment Param1 to have 1 parameter
nop ;Add more nops here, and restart profiling.
ret
testfunc endp
PROFILE_ENDS_CODE
This is the way I called the test function, using the older version of PROFILE:
I am using BiteRider's ObjAsm32's debug system (DebugCenter) to show the results.
mov eax,esp
DbgHex eax,"esp before profiling"
mov eax,offset _PROFILE
DbgHex eax,"Profile base address"
;push eax ;uncomment this to have 1 parameter
PROFILE testfunc
DbgHex qword ptr , "Cycles after profiling"
mov eax,esp
DbgHex eax,"esp after profiling"
Now the code for PROFILE v2.0, partly borrowed from MArtial_Code (January 18, 2003):
This is an include file that includes the binary code from the Profile.bin file.
The binary code was supplied by Maverick to make PROFILE v2.0 assembler independent:
PROFILERSIZE equ 660
PROFILERINITOFFSET equ 559
PROFILERRESULTOFFSET equ 4096
PROFILERSTARTOFFSET equ 1
_PROF1 segment dword 'PROFILER' ;separate section
;Profiler proc
Profiler:
db 195,137,45,56,0,0,0,189,0,0,0,0,156,143,69,60
db 137,125,52,137,117,48,137,85,44,137,77,40,137,93,36,137
db 69,32,143,69,16,139,69,8,11,69,12,117,72,199,5,193
db 1,0,0,0,0,0,0,129,45,193,1,0,0,197,1,0
db 0,199,69,92,0,0,0,0,199,69,88,0,0,0,0,199
db 69,20,91,0,0,0,233,246,0,0,0,232,168,0,0,0
db 255,69,88,131,125,88,16,114,230,139,69,0,139,85,4,137
db 69,8,137,85,12,88,163,193,1,0,0,129,45,193,1,0
db 0,197,1,0,0,199,69,92,0,0,0,0,199,69,88,0
db 0,0,0,199,69,20,159,0,0,0,233,178,0,0,0,232
db 100,0,0,0,255,69,88,131,125,88,16,114,230,139,133,0
db 1,0,0,49,219,185,1,0,0,0,57,132,141,0,1,0
db 0,118,9,139,132,141,0,1,0,0,137,203,65,59,77,92
db 114,232,139,132,221,128,0,0,0,139,148,221,132,0,0,0
db 43,69,8,27,85,12,137,69,0,137,85,4,139,69,64,139
db 93,68,139,77,72,139,85,76,139,109,80,255,53,84,0,0
db 0,157,255,37,16,0,0,0,139,69,0,139,85,4,185,0
db 0,0,0,57,132,205,128,0,0,0,117,18,57,148,205,132
db 0,0,0,117,9,255,132,141,0,1,0,0,235,34,65,59
db 77,92,114,223,137,132,205,128,0,0,0,137,148,205,132,0
db 0,0,199,132,141,0,1,0,0,1,0,0,0,255,69,92
db 195,139,125,52,139,117,48,139,69,0,139,69,32,139,69,64
db 185,4,0,0,0,131,236,32,139,4,36,73,117,247,129,196
db 128,0,0,0,255,117,60,157,155,235,5,144,144,144,144,144
db 15,162,15,49,137,69,0,137,85,4,139,85,44,139,77,40
db 139,93,36,139,69,32,139,109,56,199,68,36,252,192,1,0
db 0,255,100,36,252,144,144,144,144,144,144,144,144,144,144,144
db 144,144,144,144,144,144,144,144,144,144,144,144,144,144,144,144
db 232,0,0,0,0,199,68,36,252,0,2,0,0,255,100,36
db 252,144,144,144,144,144,144,144,144,144,144,144,144,144,144,144
db 144,144,144,144,144,144,144,144,144,144,144,144,144,144,144,144
db 144,144,144,144,144,144,144,144,144,144,144,144,144,144,144,144
db 137,45,80,0,0,0,189,0,0,0,0,137,85,76,137,77
db 72,137,93,68,137,69,64,156,143,69,84,155,15,162,15,49
db 43,69,0,27,85,4,137,69,0,137,85,4,255,101,20,86
db 87,232,90,0,0,0,129,231,0,240,255,255,141,183,0,16
db 0,0,1,127,47,1,127,51,1,127,57,1,127,61,1,127
db 82,1,127,119,1,127,125,1,191,129,0,0,0,1,191,150
db 0,0,0,1,191,157,1,0,0,1,191,201,1,0,0,1
db 119,3,1,119,8,1,183,253,0,0,0,1,183,4,1,0
db 0,1,183,2,2,0,0,1,183,7,2,0,0,95,94,195
db 139,60,36,195
db 4096*2-PROFILERSIZE dup (0) ;Combined size of section is 8192 bytes
_PROF1 ends
This is the way I called my test function, using PROFILE v2.0:
mov eax,esp
DbgHex eax,"esp before profiling"
invoke VirtualAlloc,0,8192,MEM_COMMIT,PAGE_EXECUTE_READWRITE
mov MyAlloc,eax
DbgHex eax,"Alloc base address"
;copy Maverick's binary code to the allocated memory
cld
mov esi,offset Profiler
mov edi,MyAlloc
mov ecx,8192/4
rep movsd
mov esi,MyAlloc
add esi,PROFILERINITOFFSET
call esi
mov esi,MyAlloc
add esi,PROFILERSTARTOFFSET
;push eax ;uncomment this to have 1 parameter
push offset testfunc
call esi
mov esi,MyAlloc
DbgHex qword ptr
mov eax,esp
DbgHex eax,"esp after profiling"
invoke VirtualFree,MyAlloc,0,MEM_RELEASE
If I did everything right, my conclusion is:
Both the older and the newer versions of PROFILE don't produce the correct amount of cycles anymore, on nowadays processors.
But I hope I did something wrong, for that would mean, my conclusion is wrong, and that could mean: PROFILE is still alive!!
If some of you are still using PROFILE succesfully on the newer processors, or if you can find something wrong in the code, please respond. For I would like PROFILE to be doing it's job again.
Friendly regards,
mdevries.
Unfortunately on these newer CPUs there are additional factors that should be counted in. I stopped myself at branch cache (simulating a flush) but, as you may imagine, on a Pentium M there's also trace cache issues and "God" only knows how many other little, subtle stuff to "sync". Sorry, I can't help because I don't have any Pentium M (let away the time to do it, honestly), but you or others may do it, all you need to know is that the processor must be "reset" in a clean BUT I/D-cached (to avoid memory stalls polluting the results) state before executing the code under test. Every CPU has its one cache line sizes, pipeline quircks, architecture, etc.. so at this point I think it's better to write one PROFILE routine at least per CPU family..
Hi Maverick,.
Thanks for your reply.
You did a wonderful job with PROFILE 5 years ago, although it can't be used on modern cpu's anymore, since many things in the cpu have changed since.
I think PROFILE must be considered a great achievement anyway.
Friendly regards,
mdevries.
Thanks for your reply.
You did a wonderful job with PROFILE 5 years ago, although it can't be used on modern cpu's anymore, since many things in the cpu have changed since.
I think PROFILE must be considered a great achievement anyway.
Friendly regards,
mdevries.