In ObjAsm32's Collection.inc at the end of the Collection.SetLimit method the following code can be found:

@@Err:
    OCall esi.ErrorReport, NULL, COL_OUTOFMEMORY

align @WordSize
@@Exit:
    pop esi


Does this mean that OCall never returns?
Because if it returns, you could have a problem due to the aligned @@Exit.
In between OCall and @@Exit there could be code, isn't it?
Shouldn't you jump over that code to the @@Exit label instead?

Friendly regards,
mdevries.
Posted on 2006-07-16 17:21:25 by mdevries
Hops... that's wrong... the "align" should be removed!  :O

Regards,

Biterider

Posted on 2006-07-17 01:13:53 by Biterider
Revising the generated code, the align directive fills the gap with meaningless instructions like lea ecx, , mov edi, edi, etc. that doesn't affect the main code and you profit from the loop alignment.
Conclusion, the align should not be removed.

Regards,

Biterider
Posted on 2006-07-17 01:43:32 by Biterider
If you align on a 16 bit boundary I expect there will a maximu gap of 1 byte.
If you align on a 32 bit boundary the gap would be a maximum of 3 bytes.

The bigger the gap, the more room for meaningfull code.

Revising the generated code, the align directive fills the gap with meaningless instructions like lea ecx, , mov edi, edi, etc. that doesn't affect the main code


You expect the code always to be meaningless.
But how can you be sure the code will always be meaningless?
From the tests you did? Or have you found anything in the documentation? I would be interested.

I wonder: does the assembler know that we are dealing with a gap?
If so, why would the assembler produce different code then?
If the gap remains the same, I would expect the assembler to produce always the same code in the gap. But you mention different kinds of filling of the gap.

Friendly regards,
mdevries.
Posted on 2006-07-17 15:15:09 by mdevries
Hi mdevries
MASM (ML.exe) uses always the following instructions to fill code gaps, depending on the gap size

8D 49 00         lea         ecx,
8B FF            mov        edi,edi
90              nop


As you can see, all these instructions donít change the content of the registers or the CPU flags. The compiler detects such a situation (when align was used), it tries automatically to fill the gap with those instructions. AFAIK the linker (Link.exe) uses int 3 to fill the gap between procedures, but this code should never be executed unless something goes wrong.

Regards,

Biterider
Posted on 2006-07-17 16:05:22 by Biterider
Here's the list of instructions that MASM uses to align:

lea    esp, ; 8DA42400000000 ; 7 bytes
lea    ebx, ; 8D9B00000000 ; 6 bytes
add    eax,0 ; 0500000000 ; 5 bytes
lea    esp, ; 8D642400 ; 4 bytes
lea    ecx, ;  8D4900 ; 3 bytes
mov    edi,edi ;  8BFF ; 2 bytes
nop ; 90 ; 1 byte

As you see, no registers are modified. MASM uses a combination of these instructions to fill-in the 1-15 bytes gap. Just be aware of the "add eax,0" case. (modifies flags). But in MASM, this was a good idea to do - since it'd be hilarious/stupid to bluntly align just before using the flags.

Sections in .obj are 16-byte aligned, thus it's easy for MASM to know whether a symbol is aligned to 2,4,8,16 bytes or not.

Biterider, I'd recommend using "align 16", since it improves branching speed with at least 1 cycle :) .

P.S.: And why using different instructions to fill different gaps? Compare the speed of 1,2 or 3 instructions of these against the number of NOPs that otherwise'd have to be generated :)
Posted on 2006-07-17 20:34:43 by Ultrano
Hi Ultrano
About the alignment note, AFAIK the benefit you get is related to the memory reading for the cache. in 486 machines, the cache is 16 bytes width, so it is logical to align the code to a 16 bytes boundary to avoid double readings to fill the cache. On 586 machines the cache is 32 bytes width. Should we align now to 32 bytes? Too bad that masm doesn't support it...
Correct me if I'm wrong!

Regards,

Biterider
Posted on 2006-07-20 03:32:56 by Biterider
I'm not 100% sure, it's just that the AMD optimization guide had it, iirc, and it was mentioned in a lecture in my uni, while the lecture was about AMD's code-optimization block (instruction resequencing).

Really, one could try to align some of his code to 32 byte sections, but in MASM it requires a runtime check, always or just while he's developing an .obj/.lib with optimized code/snippets.
Posted on 2006-07-20 04:37:44 by Ultrano