If you set the segment alignment to PAGE, align 64 *is* allowed.



_DATA$1 SEGMENT PAGE
blah db 30h
ALIGN 64
blah2 db 40h
_DATA$1 ENDS


blah2 will be 64-byte aligned. The same probably works for _CODE..

Thomas
Posted on 2002-04-01 07:13:01 by Thomas

That is probably due because of the (documented) fact that the cost of the RET is automatically (and intentionally) removed by the routine. So profiling a routine that just RET results in 0 cycles. One or two NOP plus a RET result in 1 cycle, etc..

As Microsoft would say, "that is a feature, not a bug".. but this time it's true. ;)
I'm not communicating effectively sorry.

I just ran xor eax,eax/xor edx,edx (pairing) & THAT'S what gave me 0 cycles. Also testing further I added another xor eax,eax & got what appeared to be a 64bit -1 back. Me scratches my head b/c we're going back in time.

So:
0 cycles for xor eax,eax
0 cycles for xor eax,eax/ xor edx,edx
-1 cycles for xor e?x,e?x * 3
1 cycle for the exchange test
mult test was 32 cycles

Again, I like the routine & will be using it a lot so let me look into it more when I get home. This could be a Mutant problem or a masm vs nasm problem but having you try & debug remotely is pointless & I'm not here to hurl slings & arrows.

Now to google up a nasm assembler for win.
Posted on 2002-04-01 10:03:04 by Mutant Slime
Maybe...
MyAlign64 MACRO

local BB

BB = 64 - ($ MOD 64)
repeat BB
nop
endm
endm
Posted on 2002-04-01 10:41:19 by bitRAKE
I think, you would get error using the macro.
At least I tried to do it sometime ago and got error I tried now your macro and got the error too.
I use a little bit different method wich include yours one.
In module were data or code appears first time for some section
I use lbl just after section derective for example
.data
somedata db ..
(or may be to use startdata equ somedata but it didn't work when I tried)
...
In macro then:
MyAlign64 MACRO
local BB
BB = 64 - (relate MOD 64)
repeat BB
nop
endm
endm
before using macro I every time put
relate = $-somedata
MyAling64

with relative things masm understood difference of them but
says errors with different use.

I know it doesn't look very convinient but I failed to tell MASM it other way.
Posted on 2002-04-01 12:31:12 by The Svin
Instead of .code use this:


.486
.model flat,stdcall

.data
testdata db 4

_TEXT$1 SEGMENT PAGE

start:
mov eax, ecx
ALIGN 64
mov edx, eax

_TEXT$1 ENDS

end start


Works perfectly:


.00401000: 8BC1 mov eax,ecx
.00401002: 8DA42400000000 lea esp,[esp][000000000]
.00401009: 8DA42400000000 lea esp,[esp][000000000]
.00401010: 8DA42400000000 lea esp,[esp][000000000]
.00401017: 8DA42400000000 lea esp,[esp][000000000]
.0040101E: 8DA42400000000 lea esp,[esp][000000000]
.00401025: 8DA42400000000 lea esp,[esp][000000000]
.0040102C: 8DA42400000000 lea esp,[esp][000000000]
.00401033: 8DA42400000000 lea esp,[esp][000000000]
.0040103A: 8D9B00000000 lea ebx,[ebx][000000000]
.00401040: 8BD0 mov edx,eax
.00401042: 0000 add [eax],al
.00401044: 0000 add [eax],al
.00401046: 0000 add [eax],al
.00401048: 0000 add [eax],al
.0040104A: 0000 add [eax],al
.0040104C: 0000 add [eax],al
.0040104E: 0000 add [eax],al


Thomas
Posted on 2002-04-01 12:43:54 by Thomas
bitRake, I would change my example macro (and maybe thas you would want to change yours) to this logic
MyAlign64 MACRO
local BB
IF (relate MOD 64) GT 0 ;evoid 64 nops if aligned ;)
BB = 64 - (relate MOD 64)
repeat BB
nop
endm
endif
endm
Thomas, thanks to the tip.
Posted on 2002-04-01 13:19:47 by The Svin
Here is a modified MASM version of profile (based on Thomas' lead):

I added three macro's to it to help make the code more flexible between PARA alignment (default, max = 16 bytes) and PAGE alignment (265 bytes).

PROFILE_ALIGN_CODE used as .code
PROFILE_ENDS_CODE used as ".code ENDS"

PROFILE_ALIGN_BSS used as .data?
PROFILE_ENDS_BSS used as ".data? ENDS"

PROFILE_ALIGN_DATA used as .data
PROFILE_ENDS_DATA used as ".data ENDS"

I also used them within the Profile.inc code, so the alignment is now on 64 byte boundries for the .data? section as well as in the include code itself.

When profiling in masm you can also use them to have the "test procs" and all relivant data segments isiolated and scoped under this PAGE alignment.

Example:


.code
start:
invoke nseed, 1234565
nop
nop
nop
PROFILE simple_test
PrintDword PROFILECYCLES
PrintDword PROFILECYCLES+4

invoke nseed, 1234565
nop
PROFILE simple_test2
PrintDword PROFILECYCLES
PrintDword PROFILECYCLES+4

invoke ExitProcess,0

; Left out cause it has no profiling significance...
nseed proc TheSeed:DWORD
.data
nrandom_seed dd 12345678
.code
mov eax, TheSeed
mov nrandom_seed, eax
ret
nseed endp

[b]PROFILE_ALIGN_CODE[/b]
; -------------------------------------------
align 64
simple_test proc
invoke nrandom, 10
ret
simple_test endp

align 64
simple_test2 proc
...
simple_test2 endp

align 64
nrandom PROC base:DWORD
...
nrandom endp

align 64
mrandom PROC base:DWORD
...
mrandom endp

; -------------------------------------------
[b]PROFILE_ENDS_CODE[/b]

end start


Anywho, i like this way of getting the alignment cause i can still have the .code section as well (which make me feel all warm and fuzzy :grin: ). However the macro names *could* be shortened, but im pretty unimaginative in such respects.

Hope you like..
:alright:
NaN

Rename this to .inc (The board doesn like .inc's apparently).
Posted on 2002-04-01 15:54:27 by NaN
Great work pals.. I'm glad it's fixed (waiting for some reports now ;) ).

I'm thinking about extending PROFILE to give precise results also about pipelines, stalls, etc.. but I've only a Athlon right here.. that will be no problem though if I work on it seriously and you give some help.

bitRAKE (or any other Athlon expert here): I didn't read yet any AMD manual (normally I'm lazy, but this time I was busy :grin: ), does the Athlon have performance monitoring counters? Can you point me at a specific AMD manual covering this issue?
Posted on 2002-04-01 17:51:11 by Maverick
PS: also I'd like to add the option to properly test data and/or code uncached.. all in ring3 and in the best possible way (which is tricky). I'll find the time.. but maybe won't be too soon (1 week? dunno).
Posted on 2002-04-01 17:54:05 by Maverick

I didn't read yet any AMD manual (normally I'm lazy, but this time I was busy :grin: ), does the Athlon have performance monitoring counters? Can you point me at a specific AMD manual covering this issue?
Document 22007, get it at www.AMD.com :) READ!
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf (new version 02/28/02)
Posted on 2002-04-01 22:37:10 by bitRAKE
I use this for my ASM code:
.686

;.MMX
;.K3D
.XMM
OPTION CASEMAP:NONE,LANGUAGE:STDCALL,DOTNAME

; Set Default Segment order and options
_TEXT SEGMENT READONLY PAGE PUBLIC USE32 'CODE'
_TEXT ENDS
CONST SEGMENT READONLY PAGE PUBLIC USE32 'CONST'
CONST ENDS
_DATA SEGMENT PAGE PUBLIC USE32 'DATA'
_DATA ENDS
_BSS SEGMENT PAGE PUBLIC USE32 'BSS'
_BSS ENDS
ASSUME CS: FLAT, DS:FLAT, SS:FLAT, ES:FLAT
Look no MODEL directive. :)

I don't like or use the simple segment directives. Once you define the segments, just use the name and MASM will use the previously defined options. Like:
_TEXT SEGMENT

ALIGN 64

Silly PROC y:DWORD, x:DWORD
_DATA SEGMENT
ALIGN 64

temp dd 123
_DATA ENDS

mov eax,y
add eax,x
mov temp,eax
ret
Silly ENDP
_TEXT ENDS
You can also write macros that do different things depending on what segment your in. I'll throw this in here, if you haven't seen it before:
cDATA MACRO y:VARARG

LOCAL sym
CONST segment
IFIDNI <y>,<>
.ERR "cDATA!"
ELSE
sym y
ENDIF
CONST ends
EXITM <OFFSET sym>
ENDM

; Use like...
movzx eax, [cDATA(db 6,2,2, 5,0,3, 5,1,4, 6,2,4) + ecx - 1]
This is such a fun tool... :)
Posted on 2002-04-02 00:14:17 by bitRAKE
Ya, this is the first time I've ventured into segment declariatons.
I have to say they are interesting to work with, after reading up on them I think i can get them help our object model out by providing private segments and public segments etc.

Im looking forward to tinkering with them more, but i gotta survive this month first :(

:alright:
NaN
Posted on 2002-04-02 01:07:42 by NaN
bitRAKE, when masm comply your declariatons is everything listed in the executive in the order you write it. I always had my douth because masm do multi scans and i never know what it look like in the executive... A debugger only read it in the way it is set up... so i don't know how it is listed in the executive in reality.

'CODE'

'CONST'

'DATA'

'BSS'
and your marco's is about seeing to that. If so, i now usderstand why you love marco's so much. I never really got it before.


Maverick, can i still use PROFILE with an old 386. And thanks for getting to look deeper into the working of the assemblers and processers. I know you worked very hard but you always make things seem so easy. I can't wait until the bottom line come about...understanding segment declariatons will be ......... ... WoW
Posted on 2002-04-02 02:04:20 by cmax

when masm comply your declariatons is everything listed in the executive in the order you write it. I always had my douth because masm do multi scans and i never know what it look like in the executive... A debugger only read it in the way it is set up... so i don't know how it is listed in the executive in reality.
The thing to remember is that MASM just creates the object file - the linker creates the EXE. You can help MASM order the segments, but that control has it's limitations. The linker can order the segments based on the segment names, and combine segments from multiple object files. It would be good to read the documentation for LINK.EXE, and the chapter in the MASM Manual about segments if this is an area of concern for you.

The macros are more about creating fluidity and cohesion between bits of code. At the same time making code changes easier. Segment changes are a part of that.
Posted on 2002-04-02 02:24:04 by bitRAKE
Hi cmax :)

Maverick, can i still use PROFILE with an old 386.
Sorry, no 386/486 and some old Cyrix 586 clone because they don't have a Time Stamp Counter (i.e. "TSC", the 64 bit register available since the Pentium which gets automatically incremented by the CPU at every clock cycle, and that one reads via the instruction RDTSC). So on those CPU's that don't have a TSC you should use another routine.. best option is to base it on the PIT. I have somewhere such a routine for Dos (in WatcomC), but some years ago I made some modify which (lame me..) I didn't test properly at the time, and now the routine seems broken in some rare circumstance. :tongue: Tell me if you're interested in Dos programming, I'll bugfight it and post it here when I squeeze some free time.

And thanks for getting to look deeper into the working of the assemblers and processers. I know you worked very hard but you always make things seem so easy.
Things in reality are always much easier and simpler than I manage to make them seem (although for me that is a primary goal, expecially when talking to others.. since I can sort the mess in my mind anyway).. it's just that we humans have the habit to think or make things seem complex that they are. ;) That's why I stress << do not learn blindly.. understand all the "how"'s and "why"'s.. ask yourself as many questions as possible and reply to all of them, make the techniques and why they work become an intimate part of yourself >>. Things should always look so simple that they can't be made any simpler anymore.. if this doesn't happen then there's surely a bad problem somewhere.

---

hi bitRAKE, thanks for the hint on the manual.. I've downloaded it and other manuals, and will read it/them when I can. Thanks again.
Posted on 2002-04-02 07:07:04 by Maverick
I could never do dos, thats why i did not like programming back than i guest. Well I guest i better go ahead an move up to Pentium programming and forget about the old 386 or i'll never get ahead PLUS have the fun that you guys are having.

Thanks Maverick
Posted on 2002-04-02 07:29:21 by cmax
Hi cmax, just a note: when I say Dos I mean 32 bit protected mode code, not 16 bit code.

Anyway I don't think you could run acceptably Windows95 on a 386. Maybe Windows 3.1, but that's like to say Dos, or even worse. ;)
Posted on 2002-04-02 07:35:45 by Maverick
PS: also, it would be the case to add a SFENCE instruction in my PROFILEr. Does anybody know if all the P6 style CPU's support SFENCE? Does the Pentium PRO or the Pentium II, for example, support it?
Posted on 2002-04-02 07:48:01 by Maverick
Hi Maverick your PROFILE macro doesn't work with my FindString procedure..

This gives an access violation


push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
[B]PROFILE FindString[/B];


This works


push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
[B]call FindString[/B];


This works
[B]PROFILE simple_test [/B]


I've used FindString extensively so I know it works. PROFILE works on simple_test but that procedure is as simple as they come... Any ideas as to why PROFILE FindString does not work?
Posted on 2002-07-03 08:56:00 by MArtial_Code
Hi MArtial_Code,

I've already explained this issue somewhere, anyway:

PROFILE calls 5 times your routine_to_be_tested, but it takes care of restoring the original registers contents each time it calls it, so no problems if you use a register to store a counter or a pointer. But if you use a memory location (which includes the stack), then you'll have to manually setup pointers at the begin of your routine_to_be_tested.

In short, here's a solution to your problem:

Instead of this:


push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
PROFILE FindString;


Use this:


PROFILE TestFindString
...
TestFindString:
push " "
push 2
push dwStringlen
push offset szString
push dwBytesread
push dwpMem
push 0
jmp FindString


Make sure that FindString balances the stack on exit, otherwise replace the jmp to it with a call, and then balance the stack and ret.

If you want to know exactly how many cycles the lone FindString took, subtract the amount of CPU cycles that those 7 push + 1 jmp instructions take from the number of cycles that PROFILE returns. You can make a separate test for that.

Finally, by logically applying the knowledge I expressed at the begin of this post, you will deduce that another possible future problem may be to profile a routine which reads and writes memory, and depends on it. For example, a routine that Finds the length of a string and stores it in a cache variable will do all of its work just the first of the five tests that PROFILE does (and needs to do).. and will behave differently the other four tests. As I've already written, PROFILE takes care of saving and restoring the CPU registers to avoid problem accross any of these 5 calls, but for memory it's up to you, for evident reasons (should we save/restore the state of the whole PC? ;) ).
Posted on 2002-07-03 11:20:08 by Maverick