Alignment
Contents
Writing efficient code is an art. Although hand optimizations can squeeze the juice out of the microprocessor, a little alertness and precautions here and there while coding can also save you fortunes.
Misalignment of data is one of the problems that you need to take care of when writing efficient code. The CPU "feels" better when data is aligned on 4-BYTE boundaries or in some cases 16-BYTE boundaries. Actually, it makes code run faster and, in some cases, the data must be aligned in order to use certain CPU features.
The processor is unable to access misaligned data in a way "natural" to it. Misaligned data is data located at an address that the processor cannot access efficiently. A 32-bit microprocessor "naturally" accesses data positioned at address boundaries evenly divisible by 4. Also, some operating systems require alignments of some structures to DWORD boundaries.
Whether a piece of data is aligned depends not only on the address where it's located, but also on its size. 1-BYTE data is always aligned, 2-BYTE (WORD) data is aligned when located at evenly divisible addresses, and 4-BYTE (DWORD) data is aligned when located at address boundaries evenly divisible by 4. This is called natural alignment.
A Simple Example
Boundaries are evenly divisible memory addresses. For example, an address that is aligned on a 4-BYTE (DWORD) boundary is evenly divisible by 4. The processor will always get it's data from DWORD boundaries and in DWORD sizes. So, if you had the following:1111 2222 3333 4444...and you wanted to get the second DWORD, the processor would find it on an address divisible by 4 (a boundary) and get it in one fetch. However, if the data was misaligned like this:
1122 2233 3344 4400...and you wanted the same DWORD, the processor would:
- Get the first DWORD (FETCH 1)
- Chop off the leftmost 3 bytes
- Get the second DWORD (FETCH 2)
- Chop off the rightmost 1 byte
- Then put them both together
- General protection faults, or GPFs.
Causes of Misalignment
(TODO)Improper Structures
(TODO)Data type organization
(TODO: Strings and data types order)Misaligned Stack Data
(TODO)Aligning Data
It is worth aligning code labels that are frequent jump targets because speed increases are often observed. With data, however, it is important to at least align it to 4-byte (DWORD) boundaries otherwise the processor making 2 reads to get the value slows down processing considerably. Some of the SIMD (Simple Instruction Multiple Data) instructions require memory aligned at 32-BYTE boundaries, which usually means allocating memory with a bit over and aligning the start position to read and write to. As a general rule, you should try to define the larger-sized data first. For example, you should define DWORDs before WORDs, and WORDs before BYTEs. You should make it a point to align data after you've defined your strings. (TODO: Structure and stack Alignment) The stack should be always aligned to 4 in Windows-based programs because misalignment often causes some API functions to fail. Begin MASM Specific <br> To align data using MASM, use the ALIGN directive. The ALIGN directive aligns the next instruction or data to the boundary specified. To align labels, the ALIGN directive places NOP (no operation) instructions wherever needed. Syntax:ALIGNExample:
ALIGN 4 ; align next data or instruction to DWORD boundary. ALIGN 16 ; align next data or instruction to 16-BYTE boundary.These are the two most common alignment directives but, generally, you can use any even number from 2 through 16. MASM, however, will complain if you ask for alignment that is greater than the segment alignment. If you use full segment definitions and specify "page", you can specify up to "ALIGN 256". (This page is not the same 4 KB page that the 80x86 microprocessor uses for paging with segment descriptors defined as in Windows. Rather, it is 256 bytes.) MASM will properly align variables declared with LOCAL to their natural boundaries up to DWORD. QWORDs are not properly aligned. With a 16-bit stack, it seems to do the same alignment of variables, but it makes no effort to align the stack properly, so its alignment of the DWORD variables will not be of much use half the time on average. Here, or with a 32-bit stack, extensively using variables larger than 32 bits efficiency will be improved by forgoing the convenience of proc and manually assigning variables and aligning the stack. I have found this to be somewhat awkward to do, especially if you want to preserve all registers on entry. Example: Aligning after defining strings. <br>
; a 17-byte string string1 db "this is a string",0 ; one reason to cause misalignment ALIGN 4 ; Align next piece of data at the next 4-byte boundary. dwValue dd 0 ; This data is now aligned.In the above example, the string is 17 bytes long. If you do not use the ALIGN 4 directive, the next piece of data gets deposited at the next byte (byte 18). I order to get the value of dwValue, the microprocessor will fetch data twice. You don't want it to do that, do you? We guess not. End MASM Specific Begin HLA Specific To align data using HLA, use the ALIGN directive or procedure option. The ALIGN directive aligns the next instruction or data to the boundary specified. To align labels, the ALIGN directive places NOP (no operation) instructions wherever needed. Syntax:
ALIGN( <<boundary>> );Example:
ALIGN( 4 ); ; align next data or instruction to DWORD boundary. ALIGN( 16 ); ; align next data or instruction to 16-BYTE boundary.These are the two most common alignment directives but, generally, you can use any even number from 2 through 16, though in general you should use a power of two. In theory, HLA supports alignments of any value, but in certain circumstances you may not be allowed to use values greater than 16. Also, as HLA's alignment capabilities depend on the underlying assembler that processes HLA's output, there may be additional restrictions based on the assembler you're using with HLA. To force the first instruction of a procedure to begin on some boundary, you may use the HLA align procedure option as follows:
procedure procName( <<OptionalParameters>> ); align(4); <<otherOptions>> begin procName; <<statements>> end procName;This aligns the first instruction of the procedure on the specified boundary. HLA automatically pads all procedure variables to 32 bits (a requirement of Windows). HLA does not, however, provide this padding to local variables. If you want to align the addresses of your local variables on the stack to some particular boundary, you can use the align directive for this purpose:
procedure procName( <<OptionalParameters>> ); <<otherOptions>> var b:byte; align(4); d:dword; begin procName; <<statements>> end procName;Note that the alignment is only within the activation record; true address alignment depends on the stack being properly aligned upon entry into the procedure. Most of the time you can count on the stack being aligned on a double-word boundary upon entry into your procedure. However, it's possible to mess with the stack prior to calling a procedure and invalidating this assumption. To help overcome this problem, HLA, by default, emits some extra code to align the stack upon entry into a procedure. For example, compiling the following HLA code
procedure TestProc(parameter: dword); @nodisplay;
var
b:byte;
w:word;
d:dword;
begin TestProc;
mov( b, al );
mov( w, ax );
mov( d, eax );
end TestProc;
produces the following MASM code:
L1_TestProc__hla_ proc near32
push ebp
mov ebp, esp
sub esp, 8 ;Allocate storage for 7 bytes + 1 byte padding.
and esp, 0fffffffch ;Align stack to four-byte boundary!
mov al, byte ptr [ebp-1] ;/* b */
mov ax, word ptr [ebp-3] ;/* w */
mov eax, dword ptr [ebp-7] ;/* d */
mov esp, ebp
pop ebp
ret 4
L1_TestProc__hla_ endp
Unfortunately, the "and esp, 0fffffffch" instruction does not align the current activation record to a four-byte boundary, but if Test Proc calls any other procedures, those procedures' stacks will be dword aligned (unless Test Proc also messes with the stack before calling those procedures).
If your program doesn't mess up the dword alignment of the stack, you can use the @nostackalign procedure option to tell HLA not to bother emitting the "and esp, 0fffffffch" instruction, thus making your code a tiny bit more efficient:
program t;
procedure TestProc(parameter: dword); @nodisplay; @noalignstack;
var
b:byte;
w:word;
d:dword;
begin TestProc;
mov( b, al );
mov( w, ax );
mov( d, eax );
end TestProc;
begin t;
end t;
Emits the following MASM code:
L1_TestProc__hla_ proc near32
push ebp
mov ebp, esp
sub esp, 8
mov al, byte ptr [ebp-1] ;/* b */
mov ax, word ptr [ebp-3] ;/* w */
mov eax, dword ptr [ebp-7] ;/* d */
mov esp, ebp
pop ebp
ret 4
L1_TestProc__hla_ endp
Note in the examples to this point that the w and d local variables have been misaligned in the activation record. This is easy to fix with an align directive in the VAR section of the procedure:
program t;
procedure TestProc(parameter: dword); @nodisplay; @noalignstack;
var
b:byte;
align(2);
w:word;
align(4);
d:dword;
begin TestProc;
mov( b, al );
mov( w, ax );
mov( d, eax );
end TestProc;
begin t;
end t;
MASM code generated by the HLA compiler:
L1_TestProc__hla_ proc near32
push ebp
mov ebp, esp
sub esp, 8
mov al, byte ptr [ebp-1] ;/* b */
mov ax, word ptr [ebp-4] ;/* w */
mov eax, dword ptr [ebp-8] ;/* d */
mov esp, ebp
pop ebp
ret 4
L1_TestProc__hla_ endp
Note that HLA always guarantees that literal string constants you create in an HLA program are stored in memory aligned to a four-byte boundary and always consume a multiple of four bytes. For example, consider the following HLA string constants appearing in a program:
program t;
static
s1: string := "Hello World";
s2: string := "Hello World.";
s3: string := "Hello World..";
s4: string := "Hello World...";
begin t;
end t;
Note the code that HLA emits for this string data (keep in mind that HLA prefixes string data with the maximum length and current length of the string):
align 4 ;align to dword boundary
L2_len__hla_ label dword
dword 0bh ;maximum length
dword 0bh ;current length
L2_str__hla_ label byte
db "Hello World"
db 0 ;zero terminating byte
align 4
L4_len__hla_ label dword
dword 0ch
dword 0ch
L4_str__hla_ label byte
db "Hello World."
db 0
byte 0 ;Extra padding to ensure that string
byte 0 ; object is a multiple of four bytes long
byte 0
align 4
L6_len__hla_ label dword
dword 0dh
dword 0dh
L6_str__hla_ label byte
db "Hello World.."
db 0
byte 0 ;Extra padding to ensure that string
byte 0 ; object is a multiple of four bytes long
align 4
L8_len__hla_ label dword
dword 0eh
dword 0eh
L8_str__hla_ label byte
db "Hello World..."
db 0
byte 0 ;Extra padding for dword alignment.
End HLA Specific
Begin FASM Specific
(TODO)
End FASM Specific
Begin GoASM Specific
Achieving correct data alignment in GoAsm<br>
Good alignment can usually be achieved automatically by declaring data in size sequence in the data section. So you would declare all qwords first, then dwords, then words, then bytes and strings. Twords, being 10 bytes, would upset the sequence - you could do them all first then correct the alignment using ALIGN.
Example:
Code: DATA TWORDINTEGER DT 0.0 ;for floating point operations TWORDRESULT DT 0.0 ALIGN 8 ;re-align data to 8-byte boundary QWORD_DATA1 DQ 0 QWORD_DATA2 DQ 0 COUNTD1 DD 0 COUNTD2 DD 0 COUNTW1 DW 0 COUNTW2 DW 0 COUNTB DB 0 Mess1 DB 'Input message',0 Mess2 DB 'Output message',0Here ALIGN is used to pad the DATA section with zeroes to bring it back into alignment for the qwords. The same can be done in a CONST section or for uninitialized data (using ? as the initializer). For Win32, GoAsm automatically aligns structures on a dword boundary, both when they are declared as local data and in the data section. For Win64 GoAsm automatically aligns structures and structure members to suit the natural boundary of the structure and its members. GoAsm also pads the size of the structure to suit. GoAsm also automatically aligns the stack pointer (RSP) ready for an API call. See the GoAsm help file for more details. Code alignment in GoAsm Correct code alignment will differ between processors. There are some speed tests in TestBug which show what difference correct alignment can make when reading from, writing to or comparing the contents of, memory. When you use ALIGN in a CODE section, GoAsm pads with instruction NOP (opcode 90h), which performs no operation. End GoASM Specific Alignment to x (if x is power of 2 is simple). For example alignment to 16
add esi, 16-1 ;esi the pointer to the pointer to be aligned and esi, -16