If you set the segment alignment to PAGE, align 64 *is* allowed.
blah2 will be 64-byte aligned. The same probably works for _CODE..
Thomas
_DATA$1 SEGMENT PAGE
blah db 30h
ALIGN 64
blah2 db 40h
_DATA$1 ENDS
blah2 will be 64-byte aligned. The same probably works for _CODE..
Thomas
That is probably due because of the (documented) fact that the cost of the RET is automatically (and intentionally) removed by the routine. So profiling a routine that just RET results in 0 cycles. One or two NOP plus a RET result in 1 cycle, etc..
As Microsoft would say, "that is a feature, not a bug".. but this time it's true. ;)
I just ran xor eax,eax/xor edx,edx (pairing) & THAT'S what gave me 0 cycles. Also testing further I added another xor eax,eax & got what appeared to be a 64bit -1 back. Me scratches my head b/c we're going back in time.
So:
0 cycles for xor eax,eax
0 cycles for xor eax,eax/ xor edx,edx
-1 cycles for xor e?x,e?x * 3
1 cycle for the exchange test
mult test was 32 cycles
Again, I like the routine & will be using it a lot so let me look into it more when I get home. This could be a Mutant problem or a masm vs nasm problem but having you try & debug remotely is pointless & I'm not here to hurl slings & arrows.
Now to google up a nasm assembler for win.
Maybe...
MyAlign64 MACRO
local BB
BB = 64 - ($ MOD 64)
repeat BB
nop
endm
endm
I think, you would get error using the macro.
At least I tried to do it sometime ago and got error I tried now your macro and got the error too.
I use a little bit different method wich include yours one.
In module were data or code appears first time for some section
I use lbl just after section derective for example
.data
somedata db ..
(or may be to use startdata equ somedata but it didn't work when I tried)
...
In macro then:
MyAlign64 MACRO
local BB
BB = 64 - (relate MOD 64)
repeat BB
nop
endm
endm
before using macro I every time put
relate = $-somedata
MyAling64
with relative things masm understood difference of them but
says errors with different use.
I know it doesn't look very convinient but I failed to tell MASM it other way.
At least I tried to do it sometime ago and got error I tried now your macro and got the error too.
I use a little bit different method wich include yours one.
In module were data or code appears first time for some section
I use lbl just after section derective for example
.data
somedata db ..
(or may be to use startdata equ somedata but it didn't work when I tried)
...
In macro then:
MyAlign64 MACRO
local BB
BB = 64 - (relate MOD 64)
repeat BB
nop
endm
endm
before using macro I every time put
relate = $-somedata
MyAling64
with relative things masm understood difference of them but
says errors with different use.
I know it doesn't look very convinient but I failed to tell MASM it other way.
Instead of .code use this:
Works perfectly:
Thomas
.486
.model flat,stdcall
.data
testdata db 4
_TEXT$1 SEGMENT PAGE
start:
mov eax, ecx
ALIGN 64
mov edx, eax
_TEXT$1 ENDS
end start
Works perfectly:
.00401000: 8BC1 mov eax,ecx
.00401002: 8DA42400000000 lea esp,[esp][000000000]
.00401009: 8DA42400000000 lea esp,[esp][000000000]
.00401010: 8DA42400000000 lea esp,[esp][000000000]
.00401017: 8DA42400000000 lea esp,[esp][000000000]
.0040101E: 8DA42400000000 lea esp,[esp][000000000]
.00401025: 8DA42400000000 lea esp,[esp][000000000]
.0040102C: 8DA42400000000 lea esp,[esp][000000000]
.00401033: 8DA42400000000 lea esp,[esp][000000000]
.0040103A: 8D9B00000000 lea ebx,[ebx][000000000]
.00401040: 8BD0 mov edx,eax
.00401042: 0000 add [eax],al
.00401044: 0000 add [eax],al
.00401046: 0000 add [eax],al
.00401048: 0000 add [eax],al
.0040104A: 0000 add [eax],al
.0040104C: 0000 add [eax],al
.0040104E: 0000 add [eax],al
Thomas
bitRake, I would change my example macro (and maybe thas you would want to change yours) to this logic
MyAlign64 MACRO
local BB
IF (relate MOD 64) GT 0 ;evoid 64 nops if aligned ;)
BB = 64 - (relate MOD 64)
repeat BB
nop
endm
endif
endm
Thomas, thanks to the tip.
MyAlign64 MACRO
local BB
IF (relate MOD 64) GT 0 ;evoid 64 nops if aligned ;)
BB = 64 - (relate MOD 64)
repeat BB
nop
endm
endif
endm
Thomas, thanks to the tip.
Here is a modified MASM version of profile (based on Thomas' lead):
I added three macro's to it to help make the code more flexible between PARA alignment (default, max = 16 bytes) and PAGE alignment (265 bytes).
PROFILE_ALIGN_CODE used as .code
PROFILE_ENDS_CODE used as ".code ENDS"
PROFILE_ALIGN_BSS used as .data?
PROFILE_ENDS_BSS used as ".data? ENDS"
PROFILE_ALIGN_DATA used as .data
PROFILE_ENDS_DATA used as ".data ENDS"
I also used them within the Profile.inc code, so the alignment is now on 64 byte boundries for the .data? section as well as in the include code itself.
When profiling in masm you can also use them to have the "test procs" and all relivant data segments isiolated and scoped under this PAGE alignment.
Example:
Anywho, i like this way of getting the alignment cause i can still have the .code section as well (which make me feel all warm and fuzzy :grin: ). However the macro names *could* be shortened, but im pretty unimaginative in such respects.
Hope you like..
:alright:
NaN
Rename this to .inc (The board doesn like .inc's apparently).
I added three macro's to it to help make the code more flexible between PARA alignment (default, max = 16 bytes) and PAGE alignment (265 bytes).
PROFILE_ALIGN_CODE used as .code
PROFILE_ENDS_CODE used as ".code ENDS"
PROFILE_ALIGN_BSS used as .data?
PROFILE_ENDS_BSS used as ".data? ENDS"
PROFILE_ALIGN_DATA used as .data
PROFILE_ENDS_DATA used as ".data ENDS"
I also used them within the Profile.inc code, so the alignment is now on 64 byte boundries for the .data? section as well as in the include code itself.
When profiling in masm you can also use them to have the "test procs" and all relivant data segments isiolated and scoped under this PAGE alignment.
Example:
.code
start:
invoke nseed, 1234565
nop
nop
nop
PROFILE simple_test
PrintDword PROFILECYCLES
PrintDword PROFILECYCLES+4
invoke nseed, 1234565
nop
PROFILE simple_test2
PrintDword PROFILECYCLES
PrintDword PROFILECYCLES+4
invoke ExitProcess,0
; Left out cause it has no profiling significance...
nseed proc TheSeed:DWORD
.data
nrandom_seed dd 12345678
.code
mov eax, TheSeed
mov nrandom_seed, eax
ret
nseed endp
[b]PROFILE_ALIGN_CODE[/b]
; -------------------------------------------
align 64
simple_test proc
invoke nrandom, 10
ret
simple_test endp
align 64
simple_test2 proc
...
simple_test2 endp
align 64
nrandom PROC base:DWORD
...
nrandom endp
align 64
mrandom PROC base:DWORD
...
mrandom endp
; -------------------------------------------
[b]PROFILE_ENDS_CODE[/b]
end start
Anywho, i like this way of getting the alignment cause i can still have the .code section as well (which make me feel all warm and fuzzy :grin: ). However the macro names *could* be shortened, but im pretty unimaginative in such respects.
Hope you like..
:alright:
NaN
Rename this to .inc (The board doesn like .inc's apparently).
Great work pals.. I'm glad it's fixed (waiting for some reports now ;) ).
I'm thinking about extending PROFILE to give precise results also about pipelines, stalls, etc.. but I've only a Athlon right here.. that will be no problem though if I work on it seriously and you give some help.
bitRAKE (or any other Athlon expert here): I didn't read yet any AMD manual (normally I'm lazy, but this time I was busy :grin: ), does the Athlon have performance monitoring counters? Can you point me at a specific AMD manual covering this issue?
I'm thinking about extending PROFILE to give precise results also about pipelines, stalls, etc.. but I've only a Athlon right here.. that will be no problem though if I work on it seriously and you give some help.
bitRAKE (or any other Athlon expert here): I didn't read yet any AMD manual (normally I'm lazy, but this time I was busy :grin: ), does the Athlon have performance monitoring counters? Can you point me at a specific AMD manual covering this issue?
PS: also I'd like to add the option to properly test data and/or code uncached.. all in ring3 and in the best possible way (which is tricky). I'll find the time.. but maybe won't be too soon (1 week? dunno).
I didn't read yet any AMD manual (normally I'm lazy, but this time I was busy :grin: ), does the Athlon have performance monitoring counters? Can you point me at a specific AMD manual covering this issue?
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf (new version 02/28/02)
I use this for my ASM code:
I don't like or use the simple segment directives. Once you define the segments, just use the name and MASM will use the previously defined options. Like:
.686
;.MMX
;.K3D
.XMM
OPTION CASEMAP:NONE,LANGUAGE:STDCALL,DOTNAME
; Set Default Segment order and options
_TEXT SEGMENT READONLY PAGE PUBLIC USE32 'CODE'
_TEXT ENDS
CONST SEGMENT READONLY PAGE PUBLIC USE32 'CONST'
CONST ENDS
_DATA SEGMENT PAGE PUBLIC USE32 'DATA'
_DATA ENDS
_BSS SEGMENT PAGE PUBLIC USE32 'BSS'
_BSS ENDS
ASSUME CS: FLAT, DS:FLAT, SS:FLAT, ES:FLAT
Look no MODEL directive. :)
I don't like or use the simple segment directives. Once you define the segments, just use the name and MASM will use the previously defined options. Like:
_TEXT SEGMENT
ALIGN 64
Silly PROC y:DWORD, x:DWORD
_DATA SEGMENT
ALIGN 64
temp dd 123
_DATA ENDS
mov eax,y
add eax,x
mov temp,eax
ret
Silly ENDP
_TEXT ENDS
You can also write macros that do different things depending on what segment your in. I'll throw this in here, if you haven't seen it before:cDATA MACRO y:VARARG
LOCAL sym
CONST segment
IFIDNI <y>,<>
.ERR "cDATA!"
ELSE
sym y
ENDIF
CONST ends
EXITM <OFFSET sym>
ENDM
; Use like...
movzx eax, [cDATA(db 6,2,2, 5,0,3, 5,1,4, 6,2,4) + ecx - 1]
This is such a fun tool... :)Ya, this is the first time I've ventured into segment declariatons.
I have to say they are interesting to work with, after reading up on them I think i can get them help our object model out by providing private segments and public segments etc.
Im looking forward to tinkering with them more, but i gotta survive this month first :(
:alright:
NaN
I have to say they are interesting to work with, after reading up on them I think i can get them help our object model out by providing private segments and public segments etc.
Im looking forward to tinkering with them more, but i gotta survive this month first :(
:alright:
NaN
bitRAKE, when masm comply your declariatons is everything listed in the executive in the order you write it. I always had my douth because masm do multi scans and i never know what it look like in the executive... A debugger only read it in the way it is set up... so i don't know how it is listed in the executive in reality.
'CODE'
'CONST'
'DATA'
'BSS'
and your marco's is about seeing to that. If so, i now usderstand why you love marco's so much. I never really got it before.
Maverick, can i still use PROFILE with an old 386. And thanks for getting to look deeper into the working of the assemblers and processers. I know you worked very hard but you always make things seem so easy. I can't wait until the bottom line come about...understanding segment declariatons will be ......... ... WoW
'CODE'
'CONST'
'DATA'
'BSS'
and your marco's is about seeing to that. If so, i now usderstand why you love marco's so much. I never really got it before.
Maverick, can i still use PROFILE with an old 386. And thanks for getting to look deeper into the working of the assemblers and processers. I know you worked very hard but you always make things seem so easy. I can't wait until the bottom line come about...understanding segment declariatons will be ......... ... WoW
when masm comply your declariatons is everything listed in the executive in the order you write it. I always had my douth because masm do multi scans and i never know what it look like in the executive... A debugger only read it in the way it is set up... so i don't know how it is listed in the executive in reality.
The macros are more about creating fluidity and cohesion between bits of code. At the same time making code changes easier. Segment changes are a part of that.
Hi cmax :)
---
hi bitRAKE, thanks for the hint on the manual.. I've downloaded it and other manuals, and will read it/them when I can. Thanks again.
Maverick, can i still use PROFILE with an old 386.
Sorry, no 386/486 and some old Cyrix 586 clone because they don't have a Time Stamp Counter (i.e. "TSC", the 64 bit register available since the Pentium which gets automatically incremented by the CPU at every clock cycle, and that one reads via the instruction RDTSC). So on those CPU's that don't have a TSC you should use another routine.. best option is to base it on the PIT. I have somewhere such a routine for Dos (in WatcomC), but some years ago I made some modify which (lame me..) I didn't test properly at the time, and now the routine seems broken in some rare circumstance. :tongue: Tell me if you're interested in Dos programming, I'll bugfight it and post it here when I squeeze some free time.
And thanks for getting to look deeper into the working of the assemblers and processers. I know you worked very hard but you always make things seem so easy.
Things in reality are always much easier and simpler than I manage to make them seem (although for me that is a primary goal, expecially when talking to others.. since I can sort the mess in my mind anyway).. it's just that we humans have the habit to think or make things seem complex that they are. ;) That's why I stress << do not learn blindly.. understand all the "how"'s and "why"'s.. ask yourself as many questions as possible and reply to all of them, make the techniques and why they work become an intimate part of yourself >>. Things should always look so simple that they can't be made any simpler anymore.. if this doesn't happen then there's surely a bad problem somewhere.
---
hi bitRAKE, thanks for the hint on the manual.. I've downloaded it and other manuals, and will read it/them when I can. Thanks again.
I could never do dos, thats why i did not like programming back than i guest. Well I guest i better go ahead an move up to Pentium programming and forget about the old 386 or i'll never get ahead PLUS have the fun that you guys are having.
Thanks Maverick
Thanks Maverick
Hi cmax, just a note: when I say Dos I mean 32 bit protected mode code, not 16 bit code.
Anyway I don't think you could run acceptably Windows95 on a 386. Maybe Windows 3.1, but that's like to say Dos, or even worse. ;)
Anyway I don't think you could run acceptably Windows95 on a 386. Maybe Windows 3.1, but that's like to say Dos, or even worse. ;)
PS: also, it would be the case to add a SFENCE instruction in my PROFILEr. Does anybody know if all the P6 style CPU's support SFENCE? Does the Pentium PRO or the Pentium II, for example, support it?
Hi Maverick your PROFILE macro doesn't work with my FindString procedure..
This gives an access violation
This works
This works
I've used FindString extensively so I know it works. PROFILE works on simple_test but that procedure is as simple as they come... Any ideas as to why PROFILE FindString does not work?
This gives an access violation
push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
[B]PROFILE FindString[/B];
This works
push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
[B]call FindString[/B];
This works
[B]PROFILE simple_test [/B]
I've used FindString extensively so I know it works. PROFILE works on simple_test but that procedure is as simple as they come... Any ideas as to why PROFILE FindString does not work?
Hi MArtial_Code,
I've already explained this issue somewhere, anyway:
PROFILE calls 5 times your routine_to_be_tested, but it takes care of restoring the original registers contents each time it calls it, so no problems if you use a register to store a counter or a pointer. But if you use a memory location (which includes the stack), then you'll have to manually setup pointers at the begin of your routine_to_be_tested.
In short, here's a solution to your problem:
Instead of this:
Use this:
Make sure that FindString balances the stack on exit, otherwise replace the jmp to it with a call, and then balance the stack and ret.
If you want to know exactly how many cycles the lone FindString took, subtract the amount of CPU cycles that those 7 push + 1 jmp instructions take from the number of cycles that PROFILE returns. You can make a separate test for that.
Finally, by logically applying the knowledge I expressed at the begin of this post, you will deduce that another possible future problem may be to profile a routine which reads and writes memory, and depends on it. For example, a routine that Finds the length of a string and stores it in a cache variable will do all of its work just the first of the five tests that PROFILE does (and needs to do).. and will behave differently the other four tests. As I've already written, PROFILE takes care of saving and restoring the CPU registers to avoid problem accross any of these 5 calls, but for memory it's up to you, for evident reasons (should we save/restore the state of the whole PC? ;) ).
I've already explained this issue somewhere, anyway:
PROFILE calls 5 times your routine_to_be_tested, but it takes care of restoring the original registers contents each time it calls it, so no problems if you use a register to store a counter or a pointer. But if you use a memory location (which includes the stack), then you'll have to manually setup pointers at the begin of your routine_to_be_tested.
In short, here's a solution to your problem:
Instead of this:
push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
PROFILE FindString;
Use this:
PROFILE TestFindString
...
TestFindString:
push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
jmp FindString
Make sure that FindString balances the stack on exit, otherwise replace the jmp to it with a call, and then balance the stack and ret.
If you want to know exactly how many cycles the lone FindString took, subtract the amount of CPU cycles that those 7 push + 1 jmp instructions take from the number of cycles that PROFILE returns. You can make a separate test for that.
Finally, by logically applying the knowledge I expressed at the begin of this post, you will deduce that another possible future problem may be to profile a routine which reads and writes memory, and depends on it. For example, a routine that Finds the length of a string and stores it in a cache variable will do all of its work just the first of the five tests that PROFILE does (and needs to do).. and will behave differently the other four tests. As I've already written, PROFILE takes care of saving and restoring the CPU registers to avoid problem accross any of these 5 calls, but for memory it's up to you, for evident reasons (should we save/restore the state of the whole PC? ;) ).