I'm at a point where my program requires a significant ammount of 'dynamic' memory, where I need to keep a list of variable-sized strings, where any one of them can be removed at any time (linked list), structures holding variable sized data and so on... The problem I'm having is that I don't know how to handle all these memory allocation/reallocation and deletion requests? I've been using GlobalAlloc/Lock on each request to a new list item or whatever, but that gets very messy quickly. Are there any examples of neat memory management? Thanks
Posted on 2004-01-20 12:30:12 by FearHQ
First of all, local/globalalloc are outdated, so you should use HeapAlloc.

What's messy about your approach?

If you have a lot of fixed-size (and especially small) allocs/deallocs,
you may want to implement a "pool" system around the lower level memory
allocation primitives.
Posted on 2004-01-20 12:49:07 by f0dder
Memory management is a tough subject. You might want to look for "Garbage collection" that is the term for what you want. I am trying to wrestle with the same issue right now and have not yet worked out a good algorithm to calculate how much "dead" space I have. I am thinking about scanning for unreferenced blocks every once in a while and if they represent a certain percentage of the total allocated then I compact the memory during some idle time. I can't give you any code as I have not written it yet, I am putting it off until later but that might give you a few ideas. If you find something let me know.

PS HeapAlloc is not what you want, use VirtualAlloc , if you are worried about large memory then obviously you are talking about multi-megabyte blocks, VirtualAlloc is the right function for large blocks, HeapAlloc is good for smaller ones.
Posted on 2004-01-20 12:51:28 by donkey
Well, one of the main problems in my code is that almost NOTHING is the same size... Everything is variable, be it a string or "large integer" structure, so I'm not sure if pools are a good idea... The mess comes from scattered GlobalAlloc's (witch I will replace by HeapAlloc)... Should I use HeapCreate or GetProcessHeap? How often should I create a 'new' heap? I'm confused because I need space for the lists, strings and "Integer" structs - should each have their heap or share a common heap?

My Integer struct looks like this



Integer STRUCT
Sign WORD ?
wSize WORD ?
pMem DWORD ?
hMem DWORD ? ; obsolete. From GlobalAlloc.
Integer ends


where wSize is the size (in dwords) of pMem and pMem is a pointer to the integer itself ( starting by most significant DWORD). Should I change this structure, or does it seem fine?
Posted on 2004-01-20 13:04:00 by FearHQ
You are taking a different approach than I am. I prefer the copy/compact for my purposes, you are using the Mark/Sweep method. That method is fine if you expect that you will not get huge fragmentation in the heap. My application would quickly fragment the heap hopelessly and performance would be disasterously affected. I am working with the idea of creating a static array of structures and having them contain pointers to the data. Each pointer is zeroed when the memory is no longer needed. Occasionally I scan the pointers and count the number of zeroes and if it reaches a certain percentage determined by the amount of memory that is currently allocated I flag it for collection then during some idle time I copy the existing items one at a time to a new block and free the old one, updating the pointers as I go. I may also have to implement a pattern search to find the degree of fragmentation and set the flag when it reaches a certain level as well, but I am dealing with thousands of items...
Posted on 2004-01-20 13:14:16 by donkey
He doesn't really mention large blocks, but he does mention strings - which usually aren't too large :)

Unless you're dealing with *very* large blocks, HeapAlloc is still pretty good (<4meg blocks on 9x) - and on NT, HeapAlloc is pretty efficient. The problem with VirtualAlloc is that you can't resize memory blocks, and thus could end up having to write a pretty full/complex memory system - and end up duplicating a lot of HeapAlloc's functionality.

But for a few large buffers of fixed size, or where you need *big* alignment, or protection flag features, VirtualAlloc is king.

Garbage collectio, ho humm. Either you manually do calls to the garbage collector, or you have no control of when the garbage collection occurs, and thus risk performance hits at unwise times. I'm not too much of a fan of traditional garbage collection - I prefer C++ datatypes that dealloc when they go out of scope. This might be a form of garbage collection too, but it's a form where you can predict when cleanup happens - which is when you leave a codeblock.

Anyway, memory pooling is a good thing - rather than freeing memory, you tag it as 'available', and ready for re-use. You can create some pretty efficient systems this way, and you can reduce heap fragmentation a lot.

Generally, use GetProcessHeap. The main advantage of additional heaps, is that you can destroy a whole heap at a time. The main benefit of this is if you have memory leaks somewhere, so it can be seen as a quite dirty hack ;). But still, if you have a large number of allocs that are only needed in one part of the program and will be freed later, plus a large number of allocs in another part of the program, you can probably reduce heap fragmentation by using a temporary heap for the temporary allocs.

You may want to have a look at the "low-fragmentation heap" stuff. This isn't introduced before WinXP, but the mechanisms aren't too hard to implement yourself.
Posted on 2004-01-20 13:39:26 by f0dder
Yes, if you are using less than 4MB the heap is fine, my purposes for garbage collection are that in a worst case scenario I can have 70 or 80 MB of highly fragmented data with around 90% holes (ie only 7 MB of real data). The heap would ofcourse do it's best to split the data across several holes leading to insufferable fragmentation. I use SEH to test for page faults ("touch" the memory) and commit another page if necessary, this solves the problem of ungrowable memory blocks. When you want to grow virtual memory you just commit another page with the offset of the next contiguous page as the first parameter allowing you to grow it a hell of alot easier than HeapGrow or any of that archaic junk. It also allows me to copy the buffers one page at a time and conserve memory during the compact. I do loose a few bytes making sure that an entry does not cross page boundaries but it is insignificant.

It all depends on the complexity of the list, from a glance yours is not as complex as mine, I need to implement a _this and _next pointer to scan the list quickly and also search hints. For my purposes garbage collection is essential, and in ASM I know of no way to do it except manually, maybe in other languages they do it whenever they want but I agree that would be a pain in the a**. As I said I just flag it and do it when I have some time (I am thinking about WM_ENTERIDLE).
Posted on 2004-01-20 14:33:50 by donkey

Yes, if you are using less than 4MB the heap is fine

Not less than 4mb - individual blocks of less than 4mb.
EDIT: and that's on 9x - no such sillyness is reported on NT, though you should keep below 256meg block allocs ^_^

If you have an estimate of your average data size(s), or even better, a number of static structure sizes, you can reduce fragmentation a *LOT* by using pooled allocation, and you may be able to avoid garbage collection methods.

Btw, are the holes in your heap that bad (apart from locality of reference, of course) - unless they're too small, the heap manager should be able to re-use the holes... Hm, I dunno if free heap pages can be discarded cleanly from memory - I would assume it can, at least on 9x. Of course this doesn't help if your average heap holes are <4k and/or mixed with pages of used heap memory.

But again, a lot of issues can be fixed by not blindly calling HeapAlloc but applying more efficient schemes ontop of it - HeapAlloc isn't bad neither for a backend nor for 'normal' allocations.

Donkey, your scheme doesn't sound all dumb :) :alright:, but it would be interested to know the overhead of... 1) your garbage collection scheme, 2) the overhead of relying on page-faults and VirtualAlloc commit/decommit calls - especially if you manage 4k at a time, instead of larger ranges.
Posted on 2004-01-20 14:53:07 by f0dder
Hi f0dder,

Well, it isn't written yet, the 4K page allocation was just for explanation, the actual size of blocks I will be using is closer to the order of 1MB at a chomp. The SEH method works because in pratical terms you write something once and read it many times. In reality the building of the linked list is only a relatively minor part of the program time-wise. For example if I am building a symbol table with 100,000 unique symbols I build the table once then refer to it a number of times. However, as I begin to discard symbols I end up with a massive number of holes in memory and the heap will naturally fill them but in a fragmented way. This is the reason for a copy/compact garbage collect. The length of time the program runs is what determines which scheme is best, if you are going to (as I am) be building large tables, analyzing them and discarding nearly 90% of the data then start another table it makes sense. I am not saying that this method makes sense in all cases, you have to think it through and decide which is best for your application. For example if the list uses fixed sizes and cannot grow beyond a reasonable level my method would be overkill and only slow you down. My problem is that my lists can be huge, the average data size is on the order of 250,000 items and can be well over 500,000, each is tagged with a variable sized label, a formula and an unknown number of links to affected cells. This was the most reasonable way that I could come up with to handle the memory. The compaction process goes like this:

LIST struct
_this DD
_next DD
.
.
.
ENDS

You allocate a block of memory for the new array and begin at the base of the old array. Check _this, if it is 0 then jump to the _next address. If _this is not 0 then copy it to the new block and jump to _next. Repeat until _next is 0. Once a block is finished if you still have data you decommit and release the old block and commit another in the new array. All pointer are updated on the move as you know _this and _next already.
Posted on 2004-01-20 15:36:22 by donkey

The SEH method works because in pratical terms you write something once and read it many times.

Yep, and if you allocate large blocks the SEH overhead shouldn't be too bad.


you have to think it through and decide which is best for your application.

Exactly.

Since you say you'll be discarding most entries, pooled allocation probably isn't a solution for you - there'd be lots of holes. Could work if you discard 90% of the symbols but do it to add new symbols, though? *shrug*

I hope you design your memory allocation in a "pluggable" way, so you can replace the internals to test various different methods :)
Posted on 2004-01-20 18:19:13 by f0dder

I hope you design your memory allocation in a "pluggable" way, so you can replace the internals to test various different methods :)


It will just be two procs, add and compact so it will be "pluggable". I can't see any other method being faster though. If you consider when a string is changed, unless it is exactly the same length as the existing one, you must add a new entry or if it is smaller you are wasting space, if it is larger you are overwritting data. With the individual allocations every time you do this you must call HeapFree and free the old record and HeapAlloc to create a new one, with my method I just set the _this member to 0 and it will be freed when I compact, it is always faster to write a single DWORD to memory than to call an API, memory is allocated in large chunks so I may have to allocate only once per 1000 writes. For myself processing the list has to be fast and this method lends itself to very fast processing and modification, it only has to pay the overhead during idle time though in extreme circumstances it may actually compact while something is going on, slowing things down. With the heap functions every write to a record requires an API call, with a few hundred records that would be nothing but when dealing with large lists that may change often it is too slow.
Posted on 2004-01-20 18:49:52 by donkey

If you consider when a string is changed, unless it is exactly the same length as the existing one, you must add a new entry or if it is smaller you are wasting space, if it is larger you are overwritting data.

Yup... and with a potential 500k symbols, even 3 wasted bytes in average per string would add up quickly :)


With the heap functions every write to a record requires an API call,

Well, if using the Heap functions directly - you can build custom allocation on top of HeapAlloc, just like you can on top of VirtualAlloc.

Sounds like the scheme you've though up is pretty good for your purpose - but never be satisfied :P

Which kind of symbols are you handling, anyway? What's your app do? Sounds like some code that's fun to tweak around and play with - where optimizations (both algorithms and structures) actually matter...
Posted on 2004-01-20 19:22:23 by f0dder
I think what I would do is allocate a large chunck of memory with GlobalAlloc as your using. When you lock the memory with GlobalLock it returns a pointer to the base of the allocated memory. From there you could use offsets from the base memory address and even store them to a variables if you like.

invoke GlobalLock,eax
mov Memory1,eax

Now just use offset pointers all in same memory

Mov eax, dword ptr

that would read the dword at next 4K

you could store that address as a sperate 4k block while staying in same memory like:

mov eax,Memory1
add eax,1000h
mov Memory2,eax

Just use same memory. I doubt your using more than 4Meg unless your doing motion graphics.

I like the KISS method best of all. (Keep It Simple Stupid) :)
:)
Posted on 2004-01-20 20:42:46 by mrgone

Which kind of symbols are you handling, anyway? What's your app do? Sounds like some code that's fun to tweak around and play with - where optimizations (both algorithms and structures) actually matter...


It is very work related, it scans through a section of a spreadsheet, say 1 million cells and looks for relationships between items and formulas. Say someone has entered W and is calcualting it's value based on X Y and Z, and in turn Y has it's value depending on A B and C, what else is also dependant on those parameters if one was to change and would the net change be positive or negative. I could recalculate the whole spreadsheet but finding the differences is phenominally hard and the report that prints out is very long to go through, I just want a tool that gives me a quick reference in say 30 or 40 seconds.
Posted on 2004-01-20 21:47:41 by donkey