What does "ALIGN 4" mean? Where should I use it? Thanks :stupid:
Posted on 2004-06-15 19:41:49 by Marginais
"align 4" means to pad with 0-3 bytes, so the next address where you put something will be on a 4-byte boundary (ie, evenly dividable by four).

You generally need to align various STRUCTs, if you don't want problems on NT. Also, keeping data aligned gives better performance.
Posted on 2004-06-15 19:50:31 by f0dder
An example might be in place... This code:



.data
dbvar db 1
ALIGN 4
ddvar dd 2
dbvar2 db 3
ALIGN 8
string db "tjulahop"


Produces something like


.data:00402000 dbvar db 1 ;
.data:00402001 db 0 ;
.data:00402002 db 0 ;
.data:00402003 db 0 ;
.data:00402004 ddvar dd 2
.data:00402008 dbvar2 db 3 ;
.data:00402009 db 0 ;
.data:0040200A db 0 ;
.data:0040200B db 0 ;
.data:0040200C db 0 ;
.data:0040200D db 0 ;
.data:0040200E db 0 ;
.data:0040200F db 0 ;
.data:00402010 string db 'tjulahop',0
Posted on 2004-06-15 19:56:15 by f0dder

You generally need to align various STRUCTs, if you don't want problems on NT.


Why? What problems? Is this officially documented somewhere?
Posted on 2004-06-16 05:39:37 by Janne
Yes, it more inherent in regards to the x86 architucture than it is an OS specific issue. The x86 pipelines work most effeciently when the memory being sought is aligned on 4 byte boundries. It can always get a byte from any point in memory, but it will cause the pipelines to stall briefly if the memory addres is not 4 byte aligned (address last byte dosnt equal 0h, 4h, 8h, or Ch). My understanding is because it internally fetches twices to put together a 32 bit internal register starting at the specificed address.

This is just my understanding, i could be 100% wrong in how it works, but i do know its due to the chip architecture...

Regards,
:NaN:
Posted on 2004-06-16 16:14:37 by NaN
Yes, x86 can actually do this automatically. Read two words (or dwords in this crooked case), extract the proper bits from both, and combine them to the resulting value. Basically twice as slow. Some other CPUs cannot read from non-aligned memory addresses at all, and if you must (should not happen very often, it's very easy to make all data aligned), you have to do it manually, at even higher cost than on x86.
Posted on 2004-06-16 16:24:02 by Scali
button line,

reading anything that is not a byte data type (x/4) from a Non-Even address will cause 2 memory fetch-ups, so this requires 2x the time reading the variable.

if you automatically keep your variables/struct on the even side, your good :)
'align x' is a good way to do it, but it will waste you a couble of bytes to keep the stuff evenly adderssed.
Posted on 2004-06-16 18:12:02 by wizzra

Why? What problems? Is this officially documented somewhere?

I don't think it's documented, just like I haven't found documentation on register preservation, or the fact that your stack must be 4-aligned... but sooner or later you'll run into trouble if you violate any of these.

Also, larger-than-4 alignment can improve speed at other times (depending on cache-line length and stuff), plus SSE variable _must_ be 16-byte aligned (unless you want to use the very slow non-aligned moves.)
Posted on 2004-06-17 05:21:30 by f0dder
f0dder,

it is documented in the IA-32 books,

Volume 1 : basic architecture
chapter 4-2
Posted on 2004-06-17 05:49:22 by wizzra