Suppose I had a buffer defined in the .data section as:

Buffer db 60 dup (?)

... my question is, what is the correct way to go about clearing the buffer once it has been used?

Sorry if this is an over simplistic question :(

thanks for your time,
- Fourfty
Posted on 2001-08-29 15:50:42 by Fourfty
If its a buffer used for null terminated strings then mov buffer,0 should do.
Posted on 2001-08-29 16:01:54 by Eóin
thanks... that seems to work. But just out of interest, is there any way of completely clearing the buffer?
Posted on 2001-08-29 16:07:49 by Fourfty
... I think I got it. I filled the entire buffer full of 0's:

mov eax,0
mov ,0
inc eax
.if eax!=60
jmp L1

... is this the best way of doing it?
Posted on 2001-08-29 16:12:55 by Fourfty
There's some sort of rep instruct but I've heard its not that fast so I never used it.

If I were going to clear the buffer then I'd move backwards to avoid the final comparasion. Also I'd clear four bytes at a times, you just should make sure that the size of the buffer is a multiple of four.


mov eax,(60/4)
@@: and [-4],0
dec eax
jnz @B ; Jump if eax != 0

And one thing about your code; the mov eax,0 should be outside the loop. As it stands it will loop forever.
Posted on 2001-08-29 16:30:47 by Eóin
On the pentium iii's up and athlon the 'rep movs' and 'rep stos' are optimized for moving and setting memory. There is a faster way on MMX processors. So this works quite well:

mov edi,OFFSET Buffer
mov ecx,(SIZEOF Buffer) / 4
xor eax,eax
rep stosd

...and here is an oportunity for a macro:

ClearBuffer MACRO buf:REQ
mov edi,OFFSET buf
mov ecx,(SIZEOF buf) / 4
xor eax,eax
rep stosd

Then you just do:

ClearBuffer Buffer

...but that isn't very good programming - trashing the registers like that behind our backs in some macro. ;) Also, it isn't very good for 'ClearBuffer edi'.
Posted on 2001-08-29 17:39:35 by bitRAKE
or you could always use:

invoke RtlZeroMemory,OFFSET buffer,60

this fills buffer with null char.

best regards,

Posted on 2001-08-29 19:34:29 by czDrillard
Zadkiel and bitRAKE,
your code is slow due you don't care about buffer alignment and about if the buffer size isn't a multiple of four...

I'm sorry but your reply is for the biggest code size and it is also the slowest:

1. 1st -> stupid invoke and push 60 and push offset buffer -> 2 push and next 2 pop?
2. next -> call RtlZeroMemory ->1 call+return here
3. Here is the code of RtlZeroMemory from Kernel32.dll + 2nd call+return here->call BFF61131 :

;Exported fn(): RtlZeroMemory - Ord:02A1h
:BFF67E7F 53 push ebx
:BFF67E80 56 push esi
:BFF67E81 57 push edi
:BFF67E82 55 push ebp
:BFF67E83 68F2000000 push 000000F2
:BFF67E88 68B323F9BF push BFF923B3
:BFF67E8D 64FF3500000000 push dword ptr fs:[00000000]
:BFF67E94 64892500000000 mov dword ptr fs:[00000000], esp
:BFF67E9B 8BC4 mov eax, esp
:BFF67E9D 6A00 push 00000000
:BFF67E9F FF7024 push [eax+24]
:BFF67EA2 FF7020 push [eax+20]
:BFF67EA5 E88792FFFF call BFF61131 ; call Stosd&stosb proc
:BFF67EAA 648F0500000000 pop dword ptr fs:[00000000]
:BFF67EB1 83C408 add esp, 00000008
:BFF67EB4 5D pop ebp
:BFF67EB5 5F pop edi
:BFF67EB6 5E pop esi
:BFF67EB7 5B pop ebx
:BFF67EB8 C20800 ret 0008

;Stosd&stosb proc
:BFF61131 55 push ebp
:BFF61132 8BEC mov ebp, esp
:BFF61134 51 push ecx
:BFF61135 57 push edi
:BFF61136 8B7D08 mov edi, dword ptr [ebp+08]
:BFF61139 8A4D10 mov cl, byte ptr [ebp+10]
:BFF6113C 8AE9 mov ch, cl
:BFF6113E 0FACC810 shrd eax, ecx, 10
:BFF61142 668BC1 mov ax, cx
:BFF61145 FC cld
:BFF61146 8B4D0C mov ecx, dword ptr [ebp+0C]
:BFF61149 C1E902 shr ecx, 02
:BFF6114C F3 repz
:BFF6114D AB stosd
:BFF6114E 8A4D0C mov cl, byte ptr [ebp+0C]
:BFF61151 80E103 and cl, 03
:BFF61154 F3 repz
:BFF61155 AA stosb
:BFF61156 5F pop edi
:BFF61157 59 pop ecx
:BFF61158 C9 leave
:BFF61159 C20C00 ret 000C

here is my code:

lea eax, Buffer
mov ecx, LenBuffer ; May be 63 !?
xor edx, edx ; edx =0
test eax, 3
jz Start_1
mov [eax], dl
inc eax
dec ecx
jz End
test eax, 3
jnz Unalign
sub ecx, 4
jl Rest
mov [eax], edx
add eax, 4
sub ecx, 4
jge Align
add ecx, 4
jz End
mov [eax], dl
inc eax
dec ecx
jnz Last_stosb
Posted on 2001-08-30 00:02:34 by buliaNaza
welll i'll stick to the simpler method as I am still learned-ing...

they never said it was the best method but it works great in my test, my buffer was 19 and I just counted byte-by-byte rather than 4-bytes

how do the clok cycles compare between bulia's and the ol'

mov eax,0
mov ,0
inc eax
.if eax!=60
jmp L1
Posted on 2001-08-30 01:21:07 by drarem
With a buffer under a couple of K in size, speed is not your problem, just keep the code small and easy to read. Simplist code would be a LODSB loop with thew fill character you need in the AL register.

Posted on 2001-08-30 02:50:16 by hutch--
Err, hutch, don't you mean a STOSB loop... :)
Posted on 2001-08-30 04:01:56 by S/390

Senile decay showing again. :grin:

Posted on 2001-08-30 07:59:58 by hutch--
buliaNaza, I respect your obvious coding skills, but I handle those factors at the definition of my buffers, in order that I can inline initialization code where I please. I'll use your algorithm, or an MMX algorithm when that is not the case. :)
Posted on 2001-08-30 08:06:43 by bitRAKE
bitRAKE, I respect you too, but the Fourfty's question was:
"... is this the best way of doing it?"
Posted on 2001-08-30 11:46:55 by buliaNaza
An alternative to buliaNaza's code (based heavily upon it)!

lea eax, Buffer
mov ecx, LenBuffer
xor edx, edx
cmp eax, 3
jle Last_stosb

mov [eax], edx
and eax, 3
sub ecx, eax
lea eax, Buffer
and eax, 0FFFFFFFCh
mov [eax], edx
add eax, 4
sub ecx, 4
jge Align
add ecx, 4
jz End
mov [eax], dl
inc eax
dec ecx
jnz Last_stosb

This code can lead to some bytes being written with zeros twice, but this avoids the loop in the initial aligning code (it also does no harm, at most it will write 4 bytes twice, but will be done as a full dword write)!

Its also slightly shorter in terms of instruction count (not sure about code length, or speed).

Its overall speed should be about the same, as it only differs in terms of the alignment routine. The main difference (speed wise)should be seen zeroing buffers between 1 & 7 bytes long, after that any differences fade into the background as the main loop is identical.

As for whether or not it is the "best" way of doing it depends entirely on the definition of best! Arguing over which way is "best" is a dangerous pass time, and also one of the "best" ways to start wars.

Posted on 2001-08-30 12:31:52 by Mirno
buliaNaza, the best way is a very subjective thing. Mine is best under the conditions in which I use it. :)
Posted on 2001-08-30 14:21:49 by bitRAKE
Mirno, I know how to align the Buffer with "and eax, -4" but...

"How to optimize for the Pentium family of microprocessors
Copyright ? 1996, 2000 by Agner Fog. Last modified 2000-03-31.

6. Alignment
On PPlain and PMMX, misaligned data will take at least 3 clock cycles extra to access
if a 4 byte boundary is crossed.
The penalty is higher when a cache line boundary is crossed.
On PPro, PII and PIII, misaligned data will cost you 6-12 clocks
extra when a cache line boundary is crossed."

lea eax, Buffer
mov ecx, LenBuffer
xor edx, edx
cmp eax, 3 ; may be cmp ecx, 3
jle Last_stosb

mov [eax], edx ; this1 is good IF the Buffer is aligned, ELSE it is slower,
and eax, 3 ; and this is the reason I use 1st stosb loop...
sub ecx, eax
lea eax, Buffer
and eax, 0FFFFFFFCh

bitRAKE, I don't want to insult you because I respect you and I hate the wars for the stupid things,
but I don't agree with you:
"the best way is a very subjective thing. Mine is best under the conditions in which I use it."
It is your human point of view rather your processor "point of view"...

Now I have a question for you:
I hate the HLL and I don't understand why I need to use a macro and spend a time to learn it!?
I just use copy&paste code with the same or different registers/variables
and this1 works for me fine because I have a full high&low level control of my code!?
Posted on 2001-08-30 18:34:53 by buliaNaza
buliaNaza, so your saying that aligning your buffer and keeping it rounded to DWORD alignment isn't faster than your code? Do you care that much about 0-3 bytes plus the size overhead of the inline code (14 bytes per use) in windows programming? In the method that I presented, I can change the size of the buffer and I don't have to change the code.

I don't loose any control using macros unless I choose to. It's like an advanced cut&paste. And I can do things with macros that would be very hard for anyone to do with code alone. :) I suppose you want examples. :) Look at the macro HERE. Can you do that in code with that kind of ease and flexiblity?
I don't understand why I need to use a macro and spend a time to learn it!?
I don't think you need to use macros, but I know that you are missing out on a great tool to help you code as you have never coded before. You can write macros to be as flexible as you want. You can write code that is more self-documenting. An example would be renaming all the registers to something more indicative of their function.

I will end in saying that macros can only add to your existing skills - they don't take anything away. Have you really taken a look at some of the more advanced macros that exist? If used within the proper context they add function where before there was none. Please, give them a try. ;)
BUFFER MACRO thename:REQ, thesize:REQ

_BSS segment DWORD public 'BSS'
thename dd (thesize + 3) / 4 dup (?)
_BSS ends

thename&Clear MACRO
mov edi,OFFSET thename
mov ecx,(SIZEOF thename) / 4
xor eax,eax
rep stosd

;Like this:

BUFFER MyBuffer, 75 ;Any size works

MyBufferClear ;No need to document this line
Like I said before it is not a good idea to trash registers in a macro behind the programmers back. I would only use this example in a small file - it would certainly not be a global macro to use without some changes.
Posted on 2001-08-30 19:09:50 by bitRAKE

This discussion sounds like it is turning into a "storm in a tea cup". BEST is a very selective criterion, size of buffer, which registers need to be preserved and what the buffer is used for all determine what is "best" in the circumstances.

Perform the operation on a small text buffer by writing zero to the first byte, if it must be filled, a very simple REP STOSB loop will do it very efficiently, if the data size is large enough, use DWORD size writes and if that is not fast enough, Ricky's MMX version may help if the processor has MMX support.

When the task at hand is a small buffer for text, elaborate solutions are not only a waste of time but are slower as well.

Why use a bulldozer when you can do the job with a garden spade ?

Posted on 2001-08-30 19:56:21 by hutch--
Dear Macro Warrior, thanks for answer, but:
1. I'm so stupid and just haven't so time to learn it, but what type in language you use, C/C++, Basic, UML, Java or...?!
"but I know that you are missing out on a great tool to help you code as you have never coded before"
2. Which1 from the great tools are easy to learn from the asm newbie like me:
this1, R.Hyde HLA, Betov's assembly, ....... ?
3. Which1 is faster: to use a registers or a memory general&local variables?
4. Do you saw this1 at the low level and have you a disassembly listing... May be you have:
"I don't loose any control using macros unless I choose to."
5. If you have, plz post it for me to understand what happened..
6. May be this macro is heavy speed optimized and I can copy it multiple times in my programs?
7. Is it the best programming practice in MASM or "Mine is best under the conditions in which I use it."
8. Is this1 is the best "food" for your processor?
9. "You can write macros to be as flexible as you want." -> I prefer to spend my time to write programs in assembly rather to write a macros!
10 ..........
11 ..........

LOCAL myStart, myEnd, num, flag ->what is this: byte, dword or word

num=0 ; what is this : xor eax,eax /mov num,eax /mov flag,eax
flag=0 ; or mov num,0 / mov flag, 0
WHILE flag EQ 0 ; this is a loop may be?
% IFDEF @CatStr(<RealEnd>,<%num> )
num = num + 1 ; what is this : inc num or add num, 1
ELSE ; or mov eax, 1/ add num,eax.......
flag = 1 ; or this1: mov eax, 1/ mov num,eax or mov flag,1.......

db offset myEnd - offset myStart ; this1 I understood

myStart LABEL BYTE ; this1 too

@CatStr(<RealEnd>,<%num> ) %num ; I want to see this1 in assembly

@CatStr(<RealEnd>,<%num> ) MACRO depth:REQ
IF (OPATTR (myEnd)) AND 0100000y ; this1 too
RealEnd&depth %(depth-1)

bitRAKE, I appresiate your efforts to made this1 but from my point of view it is so old...Sorry!

Hutch, which1 is better for asm newbies to teach them: to drive a bulldozer or to use a garden spade?
Posted on 2001-08-30 21:07:16 by buliaNaza