Because I only have 4 GP-Registers (eax,ebx,ecx,edx) for my subroutines (while using esi and edi for adressing, esp is a NoNo), I wished I can use the register ebp.

But I had (sometimes) some nasty crashes while trying to use it and so I ignored ebp, but often I think: a 5. GP-register would be handy.

1) When can I use ebp in Windows-assembler?
2) Why can Windows crash while using ebp?
3) Is it always save for my subroutines (no calls to Windows-subroutines) to push esp, use esp, pop esp?

Posted on 2002-02-05 08:29:21 by VShader
1. if it's not already in use, you can use it whenever you want
2. if your proc is frame based, you can't use ebp. also windows expects ebp to remain the same across calls, so if you wanna use it in a windows callback proc, push/use/pop
3. no
Posted on 2002-02-05 08:36:29 by cynix

"push esp, use esp, pop esp"

How would you do that? Pop takes the value from ... if you have
modified esp, you'll not get the old value of esp back :). You cannot
use esp as a general purpose register (you can do stuff with it,
but I'd only do that in extreme situations)...
Posted on 2002-02-05 08:39:11 by f0dder
"push esp, use esp, pop esp"

That should be:

"push ebp, use ebp, pop ebp".

So Nr. 3 is now a clear: yes ?

Posted on 2002-02-05 08:59:17 by VShader
that's possible, yes. As long as you don't try to access locals/params
while you have a modified EBP.
Posted on 2002-02-05 09:04:17 by f0dder
Why do you need an extra register??

Banan Proc Param:DWORD <-goes into proc
LOCAL Apple:DWORD,Orange:DWORD <-locally used stack

mov eax,
xor eax,12345678h
mov ,eax
rol dword ptr ,1
push dword ptr
pop dword ptr
mov eax,
sub dword ptr ,eax
mov eax,
ret 4

Banan Endp

use locals in your own subroutines :=)
You will never run out of registers if you do
Posted on 2002-02-05 09:45:15 by tired
tired, registers are faster as you very well know :). Especially in
gfx routines it's very easy to run out of registers... sucks having
to fall back to memory access for variables.
Posted on 2002-02-05 09:52:39 by f0dder
Use MMX!

movd MM0,REG

(use REG)

movd REG,MM0

...eight more fast storage areas.
Posted on 2002-02-05 10:08:58 by bitRAKE
do i even dare to comment on that? :=)
Without being disrespectful or anything... but i dont think that vshader is writing a 3dfx game in assembly (not yet anyway)
and i must point out that all that pushing and popping to restore ebp while borrowing it is also memory access since you write and read from the stack, probably even more extensive since you probably pass the variable via the invisible interim stack register thingy.
But nonetheless local variables may slow down things, but most likely not as bad as the windows thread scheduler.
I would go with locals any day, and when i need something optimized i will use small loops to do small things and let them work together rather then trying to write a big loop to do everything and risc running out of registers.
Im here to help, not to confuse, and i do think that you are here for the same reason f0dder.

PS, no f0dder im not angry :=)
Posted on 2002-02-05 10:10:45 by tired
?Without being disrespectful or anything... but i dont think that vshader is writing a 3dfx game in assembly (not yet anyway)"

Well, I am not writing a 3dfx game but a assembly-only 3D-game (software-only engine) where speed is critical.
Right now about 500.000 flat-shaded tris/s in 320x240*16 on my P200MMX.
Today I am implementing an octree for View Frustum Culling.
Did you know that it is possible to check more than 50,000.000 vertices/s against a plane (with mmx) on my ?slow" machine?!

I think the programm/Windows crashed (with ebp) because I use locals.

?Use MMX"

Yes, should force me to do this more.
Of course I use mmx where I can (even transforming 16 bit fixed-point-vertices with mmx).

Posted on 2002-02-05 12:55:27 by VShader
3D software engine? Want it to be fast? Then any register that will contain a constant value (e.g. the screenbuffer address, or even better the screen width, for example) should be implemented as a constant, which you will modify (via self-modyfing-code) at program load time, or some other not too frequent initialization phase.

Beware that writing to the code segment causes that cache line to be flushed from the L1 instruction cache, TLB flushes, etc.. so it's not that you can do it in your inner loop. :grin:

Neverthless I had BIG speed advantages using this technique.

Also, as others have suggested, ab-use MMX.. it's very standard nowadays. Why not use it?

Posted on 2002-02-05 15:50:40 by Maverick
Well one problem with 3D and the MMX is that FPU and MMX somehow lock eachother you can NOT use them is the same time... ouch this is a very nasty thing to do to a 3D application (ie leave it without FPU) ... i guess MMX is allmost out for 3D maybe usefull only for 2D or sound calculations...
Posted on 2002-02-05 16:00:44 by BogdanOntanu

You really need to get the swing of how a procedure is set up on the stack to try and use the base pointer EBP.

push ebp ; preserve base pointer
mov ebp, esp ; stack pointer into ebp

; write you own code here.

mov esp, ebp ; restore stack pointer
pop ebp ; restore base pointer


This tells you what you can do with EBP. What you are after when trying to use the base pointer as an extra register is a procedure that does not use EBP at all which allows you to remove the extra step of using the base pointer to store the stack pointer ESP.

This usually means not using any stack based variables which can be done but only in simpler code. The alternative is to use variables in the .DATA section in tyhe place of LOCAL in the procedure.

What is usually the practice is to code up the procedure you are after using a combination of registers and memory operands and when you have the design up and working, start selectively replacing memory operands with the remaining registers and benchmark them to see if you get a speed increase.

This allows you the maximum flexibility in allocating registers on the basis of where you get the most advantage.

Posted on 2002-02-05 16:30:28 by hutch--
You can use EBP even with locals, as long as you don't try
to access locals (or function arguments) while EBP is modified.
So, you can load registers from arguments/locals, push
ebp, use all available registers in your superspeedintensive
innerloop, pop ebp, and continue accessing locals/arguments.
Posted on 2002-02-05 16:40:43 by f0dder
EBP is only useful if there are variable allocations on the stack. Often the stack allocations are totally predictable and the use of EBP is redundant. If you don't use EBP you have to do the calculations yourself, or write a better assembler that will do it for you. ;) Locals/parameters can be referenced through ESP without problems. Not that stack frames don't have their uses, but who here is really using the benefits of the frame?
Posted on 2002-02-05 16:52:20 by bitRAKE
Hi Maverick,
Writing to the code segment causes tlb flush????
Are you really sure of that? In windows the code and data segments overlap so basically both code segment and data segment are the same, i dont see how the cpu can distinguish code page from data page in this case (code page is data page when accessed as such) maybe you are referring to dirty pages and keeping write protected code pages clean and untouched and using dirty data pages for the self modifying things?
Code segments are never writable so its a very odd statement.
Afaik you can only garantee a flush of the tlb's by loading cr3 (invalidation can fail under certain circumstances according to intel)
I would appreciate if you could clarify this, because i find theese things interesting.

Vshader, congratulations on your 50 trillion flipsyflops and a 200 megathing blah blah, i have no idea what thoose things are but high numbers are always very arousing (not).
I was only trying to be helpful when suggesting locals, but since you already use them i assume you know how they work.
As for speed you may wish to consider using them anyways, they shouldnt slow things down too much.
If you want to really optimize things i suggest you write some C code in the Intel C compiler and look at the output.
I have a feeling that with the correct switches it will produce code that can be well worth learning from.
Posted on 2002-02-05 17:17:35 by tired

Hi Maverick,
Writing to the code segment causes tlb flush????

Sorry, I meant that it will very likely cause at least two TLB misses, one data and one code. Add to that the time spent in reloading the cache lines.

If you want to use SMC, do it from another page than the one being modifyed.. so that the code that modyfies at least doesn't get flushed away from the cache and pipeline.

SMC *is* anyway officially supported in the IA32 architecture.. although for the designers of Intel CPU's it may be a pain.. but too much software would break if they stop supporting this feature.

And it can be really handy at times.

Code segments are never writable so its a very odd statement.

You forget that you CAN write data to a memory location that contains code.. so the CPU needs a mechanism to ensure L1 caches coherency, flush pipelines when necessary, etc..

I would appreciate if you could clarify this, because i find theese things interesting.

I hope I was more clear and detailed this time. I tend to write in a hurry.. too many things to do at once (I'm a bad multitasker it seems).

Posted on 2002-02-05 18:47:10 by Maverick