Maverick,

It sounds like an interesting technique you are using to avoid memory fragmentation. Using the stack to avoid the memory hole left after deallocating the temporary section before the larger allocation is good technology.

I am sure many will be interested in the further development of these ideas so as you are able to post this stuff, please do.

Regards,

hutch@movsd.com
Posted on 2002-05-07 20:41:10 by hutch--
Ok you sold me... what do i have to buy now ;)

No seriously, i use the Heap alot and would love to learn some basics of this method your talking about. Any chance you can provide a simple example? It sounds like using the stack as you suggest would fix my problem discussed in a recient post, regarding rapid memory allocating and deallocating.

So.. Please Please Please Please :rolleyes: (~Sad begging look ~)

NaN
Posted on 2002-05-07 22:43:03 by NaN
Sorry pals but there are some important things I cannot disclose. They're the result of years and years of work and unorthodox thinking, they're my very creatures and I'm very jealous of them (if, for anything else, because they are my "trade secrets" and I earn my living with them).
One of these is an original and ultrafast stack technique which, anyway, doesn't work on Win32 without some serious tweaking done elsewhere (I'm not referring to the lone TIB stuff of my last post), and would require anyway a general re-thinking of the whole program.

Said that, there's something less precious I can share, which will probably suffice:

In all these years of self-taught one of the things I've learnt is that one should always respect the very nature of objects. In this case I call "object" each conceptual unit of a program, since every system in nature can be considered made of sub units (i.e. objects), I learnt to believe in this concept although at first I was very skeptical (I thought to a whole program as a single object). Optimization is then the art to blend various objects to make things faster.. but conceptually objects have to exist, that's what I mean.
Every object has a particular nature by itself, some identity. The one of the stack, per se, is to be an excellent temporary storage. It's suited best for this task, it's in its very nature. The one of the heap, for the same reasons, is to be a good keeper of global, "resident", data.

Every routine has its own nature as well: if, like in the example I made in my last post, I allocate and free a buffer *within* my routine, and thus this buffer will not survive the exit of the routine, then it's against the nature of this routine to allocate that buffer on the heap. It's common practice, true, but it's wrong as well. So one should use logic and self-thought much before common practice.
The stack suits this scenario perfectly, due to its temporary allocations nature. It's just a matter of logic.

Also, the stack is much faster than the heap (even without some special tricks I won't describe), because of these reasons:

1) The practical nature of the stack, where to e.g. allocate 1MB you can do:


SUB ESP,1048576 ; update stack depth possibly (I described the technique elsewhere)
MOV P32 [MyBufferPt],ESP

and to free it:


ADD ESP,1048576 ; update stack depth possibly (I described the technique elsewhere)

Quite faster than any heap routine.. and you can allocate as many buffers as you wish, without fragmenting anything (you should free all of them at once, though.. not free them individually. Another side effect that makes things even faster ;) ). Also, having the possibility to have the stack as big as you wish (I explained how to remove Win32 stack checkings practical effects in another post) allows you to abuse of the stack whenever that is convenient.

2) The very "temporary" nature of the stack.. which makes it very logical to allocate my first buffer (the temporary one) to resolve that problem locally, and then use the heap only for the global, resident buffer, which is the final result of the routine.

Using the heap would just be improper, unlogical, but very standard practice.


Hope I didn't bore you, and expecially that I replied to your question. If not, I'll help as much as I can, without going into very special techniques, though.
Posted on 2002-05-08 05:11:18 by Maverick


MOV U32 [FS:4],0x7FFFFFFF
MOV U32 [FS:8],0x00000000


Hmmmm, now I think I know why my own attempt at an ultra fast stack technique failed.(I tracked the problem down to somewhere in the FS: segment)

Not as good as you Maverick, but I decided awhile ago that I don't like MASM's PROC directive. I find offsets from esp much easier to read. Kinda like 68k.

The stack depth tip is interesting, thanks for sharing that.




A little side note, I did some C++ programing and set the VC7 to make the fastest code. I noticed some of my procs ended with:


ret
add esp,4(or some other number)

Yep, a stack fix outside the proc. I thought the standard wants the called proc to fix the stack before exiting. (FYI)
Posted on 2002-05-08 06:25:25 by ThoughtCriminal
I can't tell you for sure without looking at the whole code, but I believe that the instruction after the RET is there *just* to align the following routine.. and will never get executed anyway.

It could be NOP's, LEA's which behave as NOPs, or all 0's... or even that ADD ESP,4. It's there just for alignment purposes.

If you post a disassembly (with offsets) we will be able to tell for sure.
Posted on 2002-05-08 06:29:52 by Maverick
for some reason this does not work in MASM:



ASSUME fs:NOTHING
.
.
.
mov fs:[4],7FFFFFFFh
mov fs:[8],0


I get an error "A2070 invalid instruction operands"



mov eax, fs:[0]


works fine. Can anyone clue me in? Not used to segments in 32bit.

Is there a ptr type I need to add? DWORD PTR does not work.
Posted on 2002-05-08 07:56:00 by ThoughtCriminal
It's not possible in the second case to determine if you are storing 8, 16, or 32 bits. You need to add dword ptr or Maverick's U32 macro. You may need to add parens to make it work.
Posted on 2002-05-08 15:15:49 by tenkey
mov fs:DWORD PTR [8],0 ; might work?
mov fs:,0 ; might work?
Posted on 2002-05-08 15:27:18 by bitRAKE
Thanx for the infor Maverick. It didnt go unread ;)

I also noticed the other post from a few days ago, but i will be honest, I *do* understand the stack manipulations, but i dont understand what your trying to achieve with the FS[] ???

As far as i knew, FS was reserved for exception handling, and had a specific method to adding your own 'tread' handlers. I dont get what your trying to do with 7FFFFFFF and NULL???

Can you enlighten me here?

Thanx
NaN
Posted on 2002-05-09 03:36:27 by NaN
Sure Pal.. here's some NASM syntax:



%define TIB.ExceptionList [FS:0] ; Pointer to SEH's EXCEPTION_RECORD.
%define TIB.StackBase [FS:4] ; Used by functions to check for stack overflow: upper limit.
%define TIB.StackLimit [FS:8] ; Used by functions to check for stack overflow: lower limit.
%define TIB.SubSystemTib [FS:12] ; ?
%define TIB.FiberDataOrVersion [FS:16] ; ?
%define TIB.ArbitraryUserPointer [FS:20] ; ?
%define TIB.Self [FS:24] ; Linear address of the TIB, base of FS segment.

As you see from above, SEH is only a "sub-system" of the above TIB.

FS:4 and FS:8 set the possible, i.e. allowed, extension of your stack (i.e. base and limit).
Posted on 2002-05-09 04:35:57 by Maverick

This method of accessing parameters on stack is very interesting.
And you can possibly do it a bit easier using some macroinstructions like (here in fasm format):



stdp = 0 ; stack depth variable

macro push arg
{ push arg
stdp = stdp+1 }

macro pop arg
{ pop arg
stdp = stdp-1 }

param equ esp+4*stdp+4*

; example of use:
push eax
mov eax,[param 2]
; ...
pop eax

There's a problem with the above macro. If one uses it between a forward reference to a local symbol, and the symbol, FASM generates "symbol already defined" or "invalid value" errors. For example:


JZ .exit
PUSH EAX
.exit: RET
---

By the way, although case sensitivity is a must, it makes much sense to not have case sensitivity on instructions, registers and assembler directives. You already implemented this, thanks.
The ability to overload instructions is excellent, but to be exploited fully, there should be a "IMACRO" directive which defines also a case insensitive macro (to be used e.g. for instruction overloading), as well as a "IEQU" one (e.g. for registers).

I hope I'm not annoying you with all my requests and suggestions (expecially on the FASM mailinglist), I want to support FASM as much as possible, and this doesn't mean only publicity (or the code I will release, from now on, only in FASM syntax), but it's also suggestions to make the "definitive" assembler even better.
Posted on 2002-05-23 02:21:13 by Maverick
You're right, you can get rid of this problem by using ..stdp variable instead of stdp.
Posted on 2002-05-23 03:26:52 by Tomasz Grysztar

You're right, you can get rid of this problem by using ..stdp variable instead of stdp.
Interesting! Can you explain why/how it works? Is there any other possible/useful use for ".."?
I did some experiments but found none.
Posted on 2002-05-23 04:49:24 by Maverick
This convention is borrowed from NASM. Symbols beginning with two dots are global, but not affecting the locals area. There is one example in the documentation, but it's not explained there (yes, I know, I should write a better docs, but when I have to choose if I want to devote my free time to fasm improving, or documentation/tutorials writing, I choose the first one.
Posted on 2002-05-23 05:07:12 by Tomasz Grysztar
That's the right choice. ;)
Posted on 2002-05-23 05:09:27 by Maverick