I wonder, why most programmers use this technique to alloc memory:

invoke GlobalAlloc, GMEM_MOVEABLE, dwFileSize
invoke GlobalLock, eax

Isn't it better to use this technique ? Fixed memory don't need to lock (I think) ?

invoke GlobalAlloc,GMEM_FIXED,dwFileSize

Posted on 2002-08-01 14:35:12 by Nordwind64
use HeapAlloc or VirtualAlloc :)
Posted on 2002-08-01 14:46:04 by stryker
...and why, please ? Is this faster ?
I have read in this Forum, that GlobalAlloc and HeapAlloc uses the same code...

Posted on 2002-08-01 15:14:11 by Nordwind64
Ask f0dder, he seems to explain every bit in detail. :tongue: Try searching, there seems to be a lot of discussions about this in the past about GlobalAlloc vs. HeapAlloc vs. VirtualAlloc ... Oh and BTW, there's a lot of "Flame Wars" about this. :grin:

I think there's a test program benchmarking each memory allocation functions. Can't seem to find it...
Posted on 2002-08-01 15:18:32 by stryker
Use moveable, as fixed seems to fail on NT.
Posted on 2002-08-01 16:50:28 by comrade
Yep you're right FIXED doesn't need Lock, I never seen any problems with it, but then I haven't used NT.

Stryker's probably right though, f0dder does a good job of explaining why.
Posted on 2002-08-01 17:18:50 by Eóin
I've had no trouble with GlobalAlloc + fixed under NT. Using "movable" is sorta silly as this isn't necessary under win32, as we have page-based memory management rather than the segment-based stuff of win16. Return value *does* change after GlobalLock()ing a GMEM_MOVEABLE pointer, but GlobalAlloc (and LocalAlloc) for that matter end up calling HeapAlloc on NT - 9x is somewhat muddier, I didn't bother digging in too deep. Could be that it works differently under 9x, considering the amount of crappy 16bit code that is still present in that dos extender (or OS, if you prefer...)

VirtualAlloc is not good for 'generic' allocations. It has a fair amount of overhead, and all requests are padded to nearest 4k size. Use it if you need to allocate large (>64k) buffers and want the alignment and/or page-level protection flags VirtualAlloc offers. It's good for eg framebuffers. You could also implement your own memory management scheme using VirtualAlloc, I believe the Visual C++ runtimes do this rather than using HeapAlloc... I haven't timed these against HeapAlloc, but I would assume it's faster. With VirtualAlloc you can also commit/decommit pages as you see fit. But I repeat, don't think VirtualAlloc is a replacement for HeapAlloc - it's not.

I advise people to use HeapAlloc for generic memory allocations. It's the preferred method under win32, and the legacy local/globalalloc are deprecated. You get more control with the heap functions; you can create your own heaps, and you can specify whether heap access should be synchronized or not (haven't timed the speed difference, but I assume it's extremely small).

I wrote some lengthy posts on the various allocation methods, and the conclusion is that the speed of the 'regular' allocation functions are nearly identical, and since HeapAlloc doesn't depend on COM or other stuff, you might as well use it and have as little memory bloat in your app as you can.
Posted on 2002-08-01 19:08:33 by f0dder
Using GMEM_FIXED/LMEM_FIXED can be a problem if you want to use GlobalRealloc/LocalRealloc later. AFAIK this doesn't work in Win9x.
Posted on 2002-08-01 21:31:31 by japheth
I use GlobalAlloc,GPTR,xxx in several of my programs (GPTR = GMEM_FIXED+GMEM_ZEROINIT) and have no problems relocating it using GlobalReAlloc,ptr,bytes,GMEM_MOVEABLE. I've tested this on 98 and NT. In one particular piece of code I grow the allocated memory in 512k blocks as the size of the data grows and I have not experienced any problems even after multiple calls.

Posted on 2002-08-01 21:44:14 by chorus
I have never made a call to Heap/Global/Local-XXX(), I use VirtualXXX() with my own memory manager, but intuitively I'm sure++ it all goes down to the Win32 internal garbage collection system, which can work only when you UNLOCK or RESIZE (in both cases your memory pointer will have a chance to change, so Windows can move the memory blocks as it wishes to compact them together and thus remove "holes", i.e. fragmentation. That's the whole point). If you use MEM_FIXED, you will make all those pointers "unmovable", and any fragmentation will remain there.

Sure, as f0dder points, the paged system of Win32 will anyway be able to remove the fragmented 4KB pages, but since the whole point of using Heap/Global/Local-XXX() is to make small allocations, then fragmentation will really be a problem (of course proportional to how much total memory you use, vs how much is there in the system).

In general, having my own memory management routines and having dealt with it and the problems and side effects for years, it's my strong own belief that one should avoid to keep pointers if it doesn't give benefits. From time to time, e.g. when you wait for user input (PeekMessage + sleep loop, not GetMessage .. if you base your input on that, that is) you should UNLOCK as many heap blocks as possible, thus letting Windows reorganize and compact the HEAP (or asking it explicitly to do it, if there's such a function - I hope there is), and generally making a wiser use of the system memory, avoiding wastes and fragmentation holes.

I believe that this Heap/Global/Local-XXX() system is a heritage of Win16 cooperative multitasking.. i.e. if Win32 was there from the start they wouldn't have even bothered to implement it, maybe. But IMHO this system is good.. or at least it's something similar to what I came up myself for my own "OS", and for garbage collection.. let away that cooperative systems generally perform sensibly better than preemptive-and-the-like ones, bugs excluded.

Also, I like of my "OS" that all code/data modules are relocatable in real-time, this means that I can access my allocated heap block with direct pointers, and if they change, I'll just relocate (after allocation the first time, and again every time the pointer changes) all the locations that point to it. So I don't have the overhead of loading a buffer pointer before accessing to the buffer. Because of cache considerations this gives a very important speedup. I just love this system.. e.g. I can at run-time plug or unplug "file system devices" or "gfx load modules" and all will work without indirect pointers. The goal is having a OS which never needs to be rebooted, and is always clean (i.e. fragmentation is kept under control), and can be expanded, replaced or anything while running (providing you put care in knowing if there are copies of pointers floating around, etc..).. never rebooting during its whole lifetime.

To return to the topic.. Why bother.. always allocate your memory as MOVEABLE, and then you can keep it always LOCKED if you really need so. From time to time, if you want to make some "garbage collection", you will have a chance to do it. The best of both worlds, as Van Halen says.
Posted on 2002-08-02 02:39:11 by Maverick
Thats from MSDN

Memory Management
In Windows 95/98/Me, fixed memory blocks cannot be reallocated to be movable. The GMEM_MODIFY and GMEM_MOVEABLE combination of values has no effect when a memory block is reallocated by using the GlobalReAlloc function. Similarly, the LMEM_MODIFY and LMEM_MOVEABLE combination has no effect when a memory block is reallocated by using the LocalReAlloc function.

I had some problems with the realloc versions and avoid using it. Maybe it fails in win9x only if the memory block will change its address.
Posted on 2002-08-02 03:21:12 by japheth

thank you for your opinions !
But is there a temporal difference between the functions ? Wich function creates fastest read/write access to memory ?

Posted on 2002-08-02 07:31:04 by Nordwind64
Maverick, while win16 would definitely move memory blocks
around (it was about the only way to do memory management
without paging features), I doubt this happens under win32.
It *might* happen on win9x (I doubt it though), but it definitely
wont happen on NT, as NT uses HeapAlloc inside local/globalalloc.
HeapAlloc is *not* allowed to change pointer values.

Imo, fragmentation is not that much of a problem, unless you
are "out of control". If so, you could always try creating an
extra heap, as all memory allocations can the be freed at once,
or you could implement your own memory management system.

Also, you say "PeekMessage + sleep loop, not GetMessage" - why?
With GetMessage, your thread will not be scheduled until there's
a message in your queue... should give better utilization of the
CPU then the "polling" loop of PeekMessage+Sleep.

You say that in your "OS" everything is "relocatable in real-time".
How? Delta-pointers? Or actual relocation of the code? Actual
relocation (if it has to be done more than 'seldomly') seems like
an awful waste of time to me (plus, if implemented "ontop" of a
regular OS, or if your OS is a 'real' OS :) with paging and COW,
you'll get dirty pages). If it's via delta pointers, aren't you
wasting a precious register that could be used for something else?
The idea sounds interesting, but I'd like to hear a bit about how
it's done.

Japheth, that MSDN blob only seems to indicate that you cannot
change the 'allocation type' of a memory block allocated with
GMEM_FIXED? It doesn't say anything about not being able to
just adjust the allocation size?

NordWind, all the 'normal' memory allocation functions have same
access speed to the memory. Memory Mapped Files are a little slower
in access, for some weird reason. As for allocation/deallocation
speed, Memory Mapped Files are the slowest, followed by VirtualAlloc.
The rest of the functions are about the same speed... there's difference
between "zeropage" memory and "uninitialized" memory (0page is, ie,
HeapAlloc with the HEAP_ZERO_MEMORY memory flag, or any Global/LocalAlloc).
The COM IMalloc interface (and thus CoTastMemAlloc), HeapAlloc without
the zeromem flag, SysAllocStringByteLen all return uninitialized memory,
and is a bit more than double as fast as the 0page allocations. I wrote
some lengthy posts about this earlier :).
Posted on 2002-08-02 08:59:26 by f0dder
From GlobalReAlloc in my WinAPI Guide (not MSDN):


If this parameter does not specify GMEM_MODIFY, it can be any combination of the following flags:

GMEM_MOVEABLE If dwBytes is zero, discards a previously movable and discardable memory block. If the lock count of the object is not zero or if the block is not movable and discardable, the function fails. If dwBytes is nonzero, enables the system to move the reallocated block to a new location without changing the movable or fixed attribute of the memory object. If the object is fixed, the handle returned may be different from the handle specified by the hMem parameter

It seems to me that you can realloc fixed blocks with no problem. Just don't expect the pointer returned to be any good. As I mentioned above, I use this technique plenty and it always works if I track the new pointer (if I don't the system crashes which to me would suggest that the ptr is invalidated from the call)

Posted on 2002-08-02 10:29:24 by chorus
Well, globalrealloc *does* return a new pointer/handle, so of course
you cannot depend on the old pointer ;). For GMEM_FIXED allocations,
the returnvalue is a pointer, for the others it's a handle.

Why do people keep on using global/localalloc? Is there some reason
behind this, or just old preferences?
Posted on 2002-08-02 10:37:03 by f0dder
Personally, the GlobalAlloc function is easy to implement. And I've never needed it to be "time critical" so why not? I should probably get into the habit of HeapAlloc. For the most part, though, I'm busy working on the data *in* the memory and making sure that's working, rather than implementing the memory itself.

Posted on 2002-08-02 11:55:14 by chorus
Hi f0dder, you wrote: You say that in your "OS" everything is "relocatable in real-time".
How? Delta-pointers? Or actual relocation of the code? Actual
relocation (if it has to be done more than 'seldomly') seems like
an awful waste of time to me (plus, if implemented "ontop" of a
regular OS, or if your OS is a 'real' OS with paging and COW,
you'll get dirty pages). If it's via delta pointers, aren't you
wasting a precious register that could be used for something else?
The idea sounds interesting, but I'd like to hear a bit about how
it's done.

Simply, I've my own object file format. The structure is (square brackets mean "optional"):





The source of course can be stripped away, and is used only by the compiler anyway.
Exported types can be stripped away too, and make the module usable by the compiler even if the source is missing.

Now, there are 1 or more modules. Only one will be chosen, basing on some host's environment variables. These variables can be X86 rather than 68000; WIN32 rather than DOS32 or AMIGAOS; WIN9X rather than WINNT, and so on. Each of these modules has a logical expression of the kind "X86&(WIN32|DOS32)". As I wrote above, only one MODULE (per file) will be chosen.

The module has then this structure:

loadedif section (explained above)
code or data section
relocation table
exported objects
(points to the source, quickinfo and docs areas.. all are in the "source" area)
imported objects

A module can be mapped but not yet loaded, and it can be unloaded at any time.
When, by request, loaded, a module is allocated in memory and its raw data loaded there. Then it gets relocated. Then all its exported objects are noted by the OS, and made available to all other modules which will be installed in the future. Then its imports list is walked, and all the requested objects' addresses, sizeof or other info are inserted into the module.. and if those objects weren't yet available, their respective module gets loaded (recursion takes place here). Then the optional module constructor routine gets executed. There's more than just this, though.

In practice it's a dynamic, run-time linker. There's a list of objects, etc.. which (if I need that little memory) I can discard with no problems, but losing future relocability, etc..

The modules are built by my programming language compiler, but I've written some utilities to help FASM produce them as well (a bit less comfortably, though).

With this complete info I can move an object elsewhere in memory, and know all the locations that referenced it, and thus make them point to the new object's address (or other characteristics, which are very specific to my programming language). The objects can be accessed through a bias, with no overhead.

It's rare that I need to move objects around.. it's a matter of initialization or SMC-like techniques.. but it's extremely handy in certain situations, and is part of the reasons why my programming language's OO nature has no overhead. And it anyway lets me use direct pointers anywhere I wish, add plug-in modules for my file system, or for e.g. the gfx file load system, etc.. unplug them, all with total freedom.

The compiler compiles itself, the OS has everything necessary to be a standalone system or, thanks to the "loadedif" sections I described earlier, to stay on top of any OS, being also extremely multiplatform by design. It's so modular that only the code/data (or part of) that is really needed is in RAM, although on disk there may be also the Amiga version, etc.. (Amiga version not done, but someday, who knows..).

It's modular and object oriented till the nause. Also, a collection of IMHO very clever ideas make it better than it may seem from the above description. I can't imagine a better system for code/data re-utilization, so to reduce development as much as possible, and being more and more productive program after program (much more than LIBs or DLLs or messy source code allow, and with no overhead).
Posted on 2002-08-02 12:06:37 by Maverick
Hello. Found this article on MSDN knowledge base, you might find it interesting:


among other things it confirms that GlobalAlloc/LocalAlloc call HeapAlloc on 9x platforms (I think, f0dder, you had said earlier that this was for sure true on NT but you weren't sure on 9x).

Posted on 2002-08-02 12:46:03 by chorus
Thanks chorus. I didn't bother tracing much around in 9x stuff, as there's a lot of unnamed stuff etc. Btw on NT, the local/globalalloc symbols do NOT point to the same address as I've previously said - dunno why I thought so. Oh well, perhaps they have done so on a different service pack :).

The functions are largely identical, though. There's some parameter conversion done (like GMEM_ZEROINIT to HEAP_ZERO_MEMORY). Furthermore, there's some seems-to-be undocumented values added to the HeapAlloc flags: 0x100000 for GlobalAlloc, 0x140000 for LocalAlloc.

Calls to GetProcessHeap() are not being done by Local/GlobalAlloc, they use a preinitialized dword, which seems to be initialized from "somewhere in the TIB"... study of GetProcessHeap shows it uses the same fields of the TIB.

So, on NT, Global/LocalAlloc are HeapAlloc + some_undocumented_flags :).

I usually do a "ghHeap = GetProcessHeap();" at the start of my program, and later on I have halloc, hrealloc and hfree like this:

void *halloc(u32 size)
return HeapAlloc(ghHeap, HEAP_ZERO_MEMORY, size);

void *hrealloc(void *block, u32 size)
return HeapReAlloc(ghHeap, HEAP_ZERO_MEMORY, block, size);

void hfree(void *block)
HeapFree(ghHeap, 0, block);

I'm considering adding a bool to halloc and hrealloc to specify whether HEAP_ZERO_MEMORY should be used or not, but I like the simplicity of these routines, need zero-initialized memory most of the time, and for code where the performance matters, I'd probably do it differently.
Posted on 2002-08-02 13:37:29 by f0dder
I thought to implement such wrapping functions, altough the fact of having a global variable was still haunting me... (I know there is not much impact to have some global variables especially in win32... but...) but it still is without a doubt a better solution than calling GetProcessHeap(); everytime anyway. ;)

I was wondering if there were a reason to implement these wrappers as functions rather than "#define"s...

I know the calling impact will not be an important overhead, and that functions are more elegant, but with inline... won't the compiler inline such a function? Some of your past posts says that the compiler often ignore the keyword... even with such a simplistic function?
I have been using VC6 __inline (trough an inline redefined by the C preprocessor... it allows me to compile c99 code made for gcc easily without errors and warning) and my (simple) test functions always were inlined...

Posted on 2002-08-02 13:53:07 by JCP