"call" would use a hardware stack, what is hardware stack?

The hardware stack is faster than memory statck?
Posted on 2004-06-18 16:06:21 by bloggs
I'm not sure what you mean, could you provide a reference ? The x86 provides the SS segment selector to allow for 1 segment of memory to be used for the stack. In order to use it in real mode you are limited to a 64K stack, in protected mode there is a theoretical 4 Gig limit but that must be shared with everything else in a FLAT model so it is generally limited to 1MB. I am assuming that you mean that the real mode stack is faster than the virtual stack that is used in Windows, I am not sure if that is true, it's all just memory though caching may be an issue. Of course I could be completely wrong about this, Scali or Bogdan would probably know...
Posted on 2004-06-18 16:38:12 by donkey
A stack is a stack, always hardware (at least, a CPU and memory are both hardware, right? :))...
But I think this statement may be related to a feature that some CPUs have. As you may or may not know, modern CPUs are pipelined, and this means that they are effectively a number of instructions ahead of the instruction(s) that are currently executing. That is, the pipeline is already filled with the following instructions.
Now, the problem is this... if you return from a function, the return address is popped from the stack, and then the jump is made. Obviously the return address determines where the next instructions will be coming from. But it doesn't know where this is until the address is popped, and that is too late. The pipeline cannot be filled with new instructions, so the CPU will have to wait until the pipeline is filled again (it stalls).

So, one way around this is to store the return address in some internal storage in the CPU at each call. Then when a return is hit, the return address is most probably the same as the one in the internal storage, so the CPU can just continue filling the pipeline. It still has to check if the return address from the stack is actually the same though. If it turns out that the stack was altered, and another address is popped, the CPU will have to flush the pipeline and start fetching from the proper position, again stalling. But in 99.9999% of all cases (compiler-generated code and most asm code aswell), the return address is never altered, so it works fine.

As a sidenote, other CPUs actually have a link-register, which the programmer can fill manually, if he wants to store the return address. They do not use the stack for calls, only the link register. This has the same advantages as the x86 solution. Another advantage is that no push or pop is actually performed for a call, so it is faster and simpler to execute.
Should you want more than one level of calls though (most notably recursion), then you have to manually push the link register on the stack.

Anyway, that's what I think the 'hardware stack' means in this case. If not, well perhaps you found it interesting anyway :P
Posted on 2004-06-18 16:57:05 by Scali
I get "hardware stact" from here, but I can not understand it :-(

Not all computer architectures provide a hardware stack. One can always implement a software stack
by setting aside a block of memory, thinking of it as a stack, maintaining the stack top in a variable, and
pushing or popping data by copying to or from the stack. However, this is much less convenient than
having an architecture like the 80x86 that automatically adjusts the stack top for you as you push
values, pop values, call procedures, and return from procedures.
Obviously the stack plays a large role in 80x86 procedure implementation. How can you reasonably
implement procedures in an architecture that has no stack? This section gives a brief description of one
system for doing this. It is based on the conventions commonly used in IBM mainframe computers
whose architecture is derived from the System/360 (S/360) systems first introduced in the 1960s.
The S/360 architecture includes sixteen 32-bit general purpose registers (GPRs), numbered 0 to 15.
Addresses are 24 bits long and an address can be stored in any register. The architecture includes
addressing modes comparable to direct, register indirect, and indexed.
A procedure is usually called by loading its address into GPR 15 and then executing a branch and link
instruction that jumps to the procedure code after copying the address of the next instruction into GPR
14. This makes return easy; simply jump to the address in GPR 14.
Parameter passing is more challenging. Normally GPR 1 is used to pass the address of a parameter
address list. This is a list of 32-bit storage locations (32 bits is called a word in the S/360 architecture),
the first word containing the address of the first parameter, the second word containing the address of
the second parameter, etc. To retrieve a word-size parameter, one must first get its address from the
parameter address list, then copy the word at that address.
Since the same general purpose registers are normally used for the same tasks each time a procedure
is called, problems may occur if one procedure calls another. For instance, a second procedure call
would put the second return address into GPR 14, wiping out the first return address. To avoid this
problem, the main program and each procedure allocates a block of storage for a register save area
and puts its address in GPR 13 prior to a procedure call. The procedure then saves general purpose
registers 0-12, 14, and 15 in the register save area of the calling program and GPR 13 in its own
register save area. This system is relatively complicated compared to using a stack, but it works well
except for recursive procedure calls. Since there is only one register save area per procedure,
recursive procedure calls are impossible without modifying the scheme.
Posted on 2004-06-18 19:43:22 by bloggs
I also want to know, what is the difference between a jmp and a call?
One can use a jmp to simulate a call, so call is a special jmp, isn't it?
Posted on 2004-06-18 19:50:25 by bloggs
In the case of the x86 they are more or less talking about stack management, the x86 will automatically adjust esp when you push or pop anything onto/off of the stack. I beleive this is what they mean by a hardware stack. The stack is still resident in memory however, it is only that the processor handles the operation of tracking the top of the stack through adusting ESP. You could just as easily implement this in software but in say the x86 which is not designed for that, it would obviously be slower. That is not to say that it would always be slower in every architecture, it depends on how it was designed. Generally though the x86 is more limited in addressing and data transfer when compared to other modern processors so you are definitely better off using the processor to do it for you. However I suspect on any processor using registers would always be faster, any architecture would be designed to maximize access speed for those.

The stack is only really useful for procedures, one of the upcoming conventions will be FASTCALL which uses registers instead of a stack. This would not be wise on the IA32 which has a fairly limited number of registers but I suspect it will be quite popular on the 64 bit machines that have a multitude of surplus registers to play with.

jmp does not autmatically return to the instruction following itself. It transfers control and does not remember where it came from. Call simply pushes the return address onto the stack and jmps to the specified address. When a ret is encountered the return address is popped and jumped to.
Posted on 2004-06-18 20:04:03 by donkey
You are always online, so quickly gave me a answer, thank you.
Posted on 2004-06-18 22:22:16 by bloggs
Originally posted by donkey
The stack is only really useful for procedures, one of the upcoming conventions will be FASTCALL which uses registers instead of a stack. This would not be wise on the IA32 which has a fairly limited number of registers but I suspect it will be quite popular on the 64 bit machines that have a multitude of surplus registers to play with.

To add some more to this...

Currently known calling convention for IA-64 uses 8 registers and others go to the stack. x86-64 uses 4 registers and others go to the stack. IA-64 stores the return address in a register, x86-64 stores the return address on the stack. This is for Win64. (Then again, this forum is for Windows environment. :) No need to mention others.)
Posted on 2004-06-19 00:06:26 by Starless

You are always online, so quickly gave me a answer, thank you.

Actually I use RSS feed so I see the stuff as it's posted with a little popup in my system tray...

RssReader Version
feed = http://www.asmcommunity.net/board/rdf.php

Posted on 2004-06-19 01:34:52 by donkey