Assembler for programmers beginning in the area is definitely a "garden spade". The magic rule in assembler is NEVER do more than you need. (Occams Razor)

Write the simplest code possible to do the job. Know what is relevant and what is not, never waste your effort doing something that is useless and make sure your results work well.

Assembler is about small and efficient code, the alternative ends up as rubbish when the criterion for its production is not functionality but opinion or fashion.

As an old expression goes in motor racing, "When the flag drops, the bullshit stops."


Posted on 2001-08-30 21:27:22 by hutch--
Hutch, sorry but I don't understand you:
"Assembler is about SMALL and EFFICIENT code, the alternative ends up as rubbish when the criterion for its production is not FUNCTIONALITY but opinion or fashion."

1. What is EFFICIENT code? May be faster?
2. Can you made SMALL and EFFICIENT code without FUNCTIONALITY? I can't...
Posted on 2001-08-30 21:46:17 by buliaNaza
I'm sorry, buliaNaza. That algorithm is not an introduction to MASM macros, but I will put together an explaination for you and email it to you or post it here. It is not my intent to be rude. I am a bit too aggressive = 'bitRAKE'.
Posted on 2001-08-30 21:50:43 by bitRAKE
bitRAKE, thank you for the answers of all my newbie questions...
Posted on 2001-08-30 22:02:51 by buliaNaza
    [*]I primarily program in ASM[*]I don't know what type of newbie you are. I use the MASM manual[*]You know the answer to this :)[*]A disassembly will be with the explaination[*]K[*]Fastest code is code that doesn't exist. This macro doesn't create any code.[*]What works for you. (see Hutch's message above)[*]See answer 6.[*]They are not mutually exclusive[*](the rest will follow with my explaination - I must sleep now)
Posted on 2001-08-30 22:42:19 by bitRAKE
Ok.. side stepping a bit...

How does any silicon chip realize: if i fill one register the address of some memory location, and another register a length, that it may be mis-alligned before i start anything???

If the length is 63 its mis-alligned?, but if it place 60 its not??

Pondering on this i *could* see the last and final 4 byte mov becoming a mis-allignment.. but then it never does do this last bit, as the movs are changed to byte movs for up to the last 3 bytes... So i dont see how any miss allignment would ever show up??

Does the *location* of the buffer in memory matter then? Does all buffer addresses need to be on 4 byte multiples?? If so from what offset, the start of the data segment?? If this was true, it would be possible to pass 60 as a length and still be mis-alligned if the i had a 3 byte piece of data that preceeded the buffer address that is to be filled..

I really dont get this stuff, and I've been trying for ever. Meanwhile everyone is kicking up dust over how it works "best"...

The example that comes to mind that my thoughts stems from is: (from the M32Lib ~ not too different):

mov esi, [Source]
mov edi, [Dest]
mov ecx, [ln]

shr ecx, 2
rep movsd

mov ecx, [ln]
and ecx, 3
rep movsb


Or is this mis-alligned code as well?

Posted on 2001-08-31 00:27:18 by NaN

Alignment is no big deal, just get the address and see if it divides by 4 with no remainder. The memcopy proc from the MASM32 lib will copy misaligned data but it will be a lot more efficient of the data is aligned at least by 4.

GlobalAlloc() is 4 byte aligned at the start address from memory and OLE string memory is 16 byte aligned so unless you were doing something very unusual, most of what you will copy is already aligned to start with.

If you cannot help working on data that is not at least 4 byte aligned, a direct BYTE copy may be more efficient as it will not have any penalty reading at a byte level where DWORD copy will have problems when reading across a 4 byte boundary.


Posted on 2001-08-31 04:07:06 by hutch--
The reason I posted that as an alternative is that it will be faster when the buffer is misaligned by 2 or 3 bytes.
The cost of moving around your loop outweighs the cost of dealing with the data as a group of 4.
The cost of looping is quite high, especially as only one of the two branches should be predicted on the first itteration.

You are correct about the initial cmp, its should be ecx.
It should also be compared to 6 instead of 3.
Originally I accidentally missed the "Start_1" section (doh!), so it would corrupt memory under certain circumstances.
You could always put the same code back, and as I haven't tested it either way it may be better.

I didn't doubt that you knew how to align using and. It is an alternative piece of code, and under certain circumstances is faster. That was purely my motivation for posting it.

Whether or not it is better, well thats up to you (which is why I posted as an alternative, rather than improvment)!

Posted on 2001-08-31 05:27:45 by Mirno
Hutch.. Thanx for the plain english...

So: If ( address % 4 == 0 ) then its alligned..
and If ( address %4 != 0 ) then its not alligned...

Gotcha... This helps nail down the variables..


Posted on 2001-08-31 12:11:43 by NaN