hi,

  what is "Align" or "Alignment"
  i see i many example in forum with different values like
  align 16
  align 8

  how we know where we used wich number like 8 or 16,where it is useful?
  plz guide me with example
  thks in advance

Posted on 2006-10-11 13:13:20 by AssemblyBeginner
Align aligns :) the output location (of the next instruction / piece of data) to the boundary you specify. This is usually for optimization reasons, and has to do with caching... Agner Fog does a much better job at explaining it than me, though.

Notice that you don't align just for optimization purposes: on windows NT, some APIs will fail if you pass it structures that aren't aligned to at least 4 bytes, and some SSEx instructions will fail if used on memory operand that isn't 16-byte aligned.
Posted on 2006-10-11 16:51:55 by f0dder
note that on 32bit XP, handler catches these exceptions with unaligned MMX instructions and change their opcode in memory.

This doesn't happen anymore on vista, so you need to have even stack aligned to 16bytes... hell for asm coders. i believe it's because this was interfering with PatchGuard, but it's just my theory
Posted on 2006-10-11 17:02:37 by vid

This doesn't happen anymore on vista, so you need to have even stack aligned to 16bytes... hell for asm coders.

Shouldn't be that bad to handle via proc macros - I would think fasm is powerful enough to handle this?


i believe it's because this was interfering with PatchGuard, but it's just my theory

Or perhaps just because it was wrong to do such a thing, which could lead to baffled developers scratching their beard, worrying about bad performance, instead of getting an exception and realizing they had some alignment to fix :P
Posted on 2006-10-11 17:07:22 by f0dder
Just to expand a bit on why and where alignment is important consider the following.

The x86-32 family of processors read DWORD size chunks of memory, it is actually the smallest (and only) size it can read, other sizes like WORDs and BYTEs are actually read as a DWORD then masked/shifted to get the required data size. When the processor reads a DWORD it will always read on a 4 byte boundary, that is it will read at address 0, 4, 8, 16 etc... regardless of the address you specify, so if you ask it to read a byte at address 2 it will read the DWORD at 0 then mask it and shift it right 8 bits. This is not exactly what happens but it is useful to visualize it this way. With BYTEs alignment is not much of an issue from a read point of view however WORDs and especially DWORDs present a problem. If you were to read a DWORD at address 2, the processor would read the DWORD at address 0, take the most significant byte and save it, then read the DWORD at address 4 and combine the least significant 3 bytes with the stored byte to "build" the actual DWORD you wanted. This "double read" eats clock cycles and slows down your application.

Another case where alignment is important was addressed by the others who replied, some newer MMX and especially SSE instructions require that the data be aligned in a certain way and will throw an exception if it is not. As well, on NT based systems some API commands require a specific alignment for the structures that you pass them (usually 16 bytes).

As a rule of thumb, always try to keep like data grouped together in your data section (DWORDs with DWORDs, text with text etc...) and use ALIGN between them. Data should be aligned at a boundary equivalent to it's size, BYTEs (1 or no alignment) WORDs (2 byte alignment) DWORDs (4 byte aligment) QWORDs (8 byte alignment) etc...

Donkey
Posted on 2006-10-11 22:28:28 by donkey
a reasonable and detailed answer.
Posted on 2006-10-12 01:06:13 by dcskm4200
Also remember that a processor never reads/writes *just* a dword, they work with "cache lines". If you need really high-speed stuff, do read agner fog's document.

Also: never mix code and data. On some processors this can give extreme penalties... Can't remember the specifics (ie., if it is only with write data or read data as well), but some encryption code Herbert Kleebauer posted to alt.lang.asm suffered extreme penalties because his code and data were "too close together".
Posted on 2006-10-12 02:00:45 by f0dder

Also remember that a processor never reads/writes *just* a dword, they work with "cache lines". If you need really high-speed stuff, do read agner fog's document.


same with reading cachelines of code, so you sometimes see usage of align before an innerloop it is used to be sure the code fits inside one cacheline and not need to have the code lay over two cachelines which can slow down code if the cpu needs to readin a second cacheline before be able to continue execution
Posted on 2006-10-12 03:33:50 by daydreamer
f0dder,

    What is meant by mixing code and data?  If I write .DATA followed by some data declarations and then .CODE followed by some instructions, then another .DATA and declarations followed by .CODE with instructions, doesn't the linker group all the data blocks into a single data segment and code blocks into a code segment?  And what about a jump table within a .DATA block?  Does a indirect jump via a data segment play havoc with performance?  Can you give a quick example or description of what is bad.  Inquiring and pedantic minds would like to know.  Ratch
Posted on 2006-10-12 11:22:20 by Ratch
thank you all for a great information and i go through agner fog document.thanks  :)
Posted on 2006-10-12 11:34:34 by AssemblyBeginner
Ratch, by mixing I mean putting them in the same output section - doing .code then .data etc. is just fine, since MASM output the way you're saying.

Jump table in data block is fine, since it's still data and not code.

I think the problem is only with (modified) data in a code block, or actually *near* a code block (iirc empirical tests showed that you need at least 1kb between modified data and code).

But I've forgotten the details (I don't mix data and code so I don't need to keep it memorized :)), and I'm sure the Agner Fog manuals explain it...
Posted on 2006-10-12 16:05:52 by f0dder
f0dder,

    Thanks for the clarification.  Ratch
Posted on 2006-10-12 19:51:43 by Ratch


This doesn't happen anymore on vista, so you need to have even stack aligned to 16bytes... hell for asm coders.

Shouldn't be that bad to handle via proc macros - I would think fasm is powerful enough to handle this?

you can do it in 2 ways:
1. use only ESP and keep track of what you have in stack. with this way it's hard and sometimes impossible to use stack for your purposes, like pushing things etc
2. use EBP, this way you have to align stack every time you call OS. this aligning is quite nasty. but you can push/pop this way.
FASM has/had macros for both cases

Or perhaps just because it was wrong to do such a thing, which could lead to baffled developers scratching their beard, worrying about bad performance, instead of getting an exception and realizing they had some alignment to fix :P

how do you say some VB programmer that he has to align stack in his callback procedure?
Posted on 2006-10-14 09:38:40 by vid

The x86-32 family of processors read DWORD size chunks of memory, it is actually the smallest (and only) size it can read, other sizes like WORDs and BYTEs are actually read as a DWORD then masked/shifted to get the required data size.

It's a bit more complex than that - the cache causes burst reads of several DWORDs to fill a cache line. And in the case of Pentium-class processors, the physical data bus width is 64-bits, meaning the processor will read QWORDs from memory boards.

I haven't looked at the low-level internal architecture of these processors, so I can't say whether the data paths from the cache are designed as 32-bit-only or not.

Whatever the case, if you align on a "natural" boundary for the data type, it will prevent extra read and write cycles.
Posted on 2006-10-16 10:36:43 by tenkey
Hi donkey,

I couldnt understand what you meant with your last paragraph:

"As a rule of thumb, always try to keep like data grouped together in your data section (DWORDs with DWORDs, text with text etc...) and use ALIGN between them. Data should be aligned at a boundary equivalent to it's size, BYTEs (1 or no alignment) WORDs (2 byte alignment) DWORDs (4 byte aligment) QWORDs (8 byte alignment) etc..."

Can u expand your explanation and give some examples?

thanks.
Posted on 2006-11-08 12:24:20 by hakand
Meaning that instead of

.data
long1 dd ?
char1 db ?
long2 dd ?
string1 db 32 dup (?)
long3 dd ?
short1 dw ?
.code

you should make them:


.data
string1 db 32 dup (?)
char1 db ?

align 4
long1 dd ?
long2 dd ?
long3 dd ?

short1 dw ?
.code
Posted on 2006-11-08 16:11:30 by Ultrano