This article and its 2nd part talk about the garbage collector and the memory management in .NET
Article

It is possible in any way to do such fast allocations on the heap without writing a full blown garbage collector.

The article says that in the usual way of allocating memory on the heap, the program keeps a linked list which it has to traverse each time. Can anyone explain me why the program has to do this? Why does it have to traverse through the list each time?

Also how is allocation on the stack so fast?
Posted on 2002-11-06 11:55:21 by clippy
The reason the stack is fast is because it is simple.
The parameters under which the stack operates are tight, and predictable, and so can be heavily optimised. The processor is fairly certain that usage will be around the top of the stack, and the further you move down, the less likely the variable is to be accessed - it caches accordingly. Data on the stack is generally aligned by virtue of the fact it is pushed on to it in a set size (32 bits in 32 bit mode), this gives a general advantage (it is seen in all memory access, but by virtue of the self-maintaining aspect of it the stack sees it more).

The reason for the linked list is because the ammout of allocated memory is not necessarily known at compile time, also the order in which chunks of memory will be de-allocated is also potentially random. Using an array puts an upper limit on the number of blocks of memory you can keep track of. This can be a very serious limitation.

If you want to make something faster, and this applies as a general rule to every program you write, constrain it. The less it has to cope with, the quicker it will do it. Look at the scenario you intend to implement, and pick your algorithms accordingly. There are times when the "slowest" algorithm can be the most preferable.

Mirno
Posted on 2002-11-06 12:44:45 by Mirno
But what i still dont understand is why does the link list has to be traversed fully each time an object is created.

One can point to the top of the list and when an object is created, it is just added to the top. Why does it have to go through the whole list again?

Yes, When an object is deleted, then it might need to traverse the whole list to deallocate the object and delete its reference from the linked list.

How is a large enough block of memory searched in the big 'pool' of memory which would allocate the object?
Posted on 2002-11-06 13:09:12 by clippy

But what i still dont understand is why does the link list has to be traversed fully each time an object is created


As you noted, it does not need to depending on the algorithm. There are many ways to manage a heap.

NET does not move through it per the article. If always allocates at the very end (until you run out of
memory). At this point in time your application will stall or seem sluggish as it compacts the heap all
at once.

C / C++ and some others which use the linked list approach move through the list each time look for
a large enough free slot to house the request. Per iteration it is a slower than the above method,
however it does not suffer the consequences of having to compact the heap either.

Take your pick at which you prefer. Either 90% blazing fast with 10% occational stalls or 100% decently
fast and 0% stalls...
Posted on 2002-11-06 14:19:59 by Graebel
>But what i still dont understand is why does the link list has to be traversed fully
is not fully traversed, only up to the first free block of memory that is large enough
>One can point to the top of the list and when an object is created, it is just >added to the top.
>why does it have to go through the whole list again?
Because free memory can be anywere, no just at the top, except for stack allocation.
Posted on 2002-11-06 14:26:54 by octavio
is not fully traversed, only up to the first free block of memory that is large enough

I am confused. Doesnt the linked list hold the list of objects allocated on the memory and keeps removing items from itself once objects are deallocated. If so then how will iterating through the linked list help in finding free blocks of memory?
Posted on 2002-11-07 06:59:05 by clippy
Yeah ok, I can see where you are confused. Lets say (stupid example) that we
start with umm 5 objects allocated on the custom heap, each of which is 20 bytes long.

<item 5> <link null> <flag used> 20 bytes
<item 4> <link to 5> <flag used> 20 bytes
<item 3> <link to 4> <flag used> 20 bytes
<item 2> <link to 3> <flag used> 20 bytes
<item 1> <link to 2> <flag used> 20 bytes

At this time all is well and good. Now we dont need item 2 anymore so we can mark it
as un-used so the garbage collector can do something with it.

<item 5> <link null> <flag used> 20 bytes
<item 4> <link to 5> <flag used> 20 bytes
<item 3> <link to 4> <flag used> 20 bytes
<item 2> <link to 3> <flag not used> 20 bytes
<item 1> <link to 2> <flag used> 20 bytes

If we now create a new object (for simplicity 20 bytes) what happens? One of two
things. 1) If the garbage collector has compacted the heap already take directly
from the top of the stack or 2) iterate through the list and look for an open slot
thats big enough to house the request (which will be item 2). After getting to item
two, no further work is required. You can remark it as used and pass the pointer
back to the application requesting the storage...
Posted on 2002-11-07 09:24:24 by Graebel
Another memory management strategy is arena allocation. This is useful when dealing with temporary data structures.

You allocate large memory pools or arenas and suballocate your items. Like typical GC strategies, you suballocate at the end of the suballocated space for speed and you ignore items which are no longer used. At the end of your processing, when you don't need the data structures any more, you deallocate arenas instead of each item.
Posted on 2002-11-07 12:09:13 by tenkey
Hmm... Ok I think i am starting to get a clear picture now.

An idea- Tell me whether its completly right or wrong or about its benefits or shortcomings.

In languages like C++ which dont support garbage collection, one can speed up allocation the following way.

Reserve one big block of memory, the way .NET does. On this block of memory you keep adding objects on top of each other and also store info about their size in a linked list.
When an object is deallocated just mark it so in the list.
After a certain period of time or when the big block of memory is full, compact the block by removing all the objects which are marked deallocated.
If there is enough memory on the top after compacting to allocate the requested object do so otherwise reserve another big block of memory.
Sure the program would slow down here but 90% of the time it will go super fast.

One can add this fucntionality to an open source compiler like gcc and obtain the same super speed that microsoft is claiming for .NET.
Also for people which still prefer the old style of allocating memory there can be a switch in the compiler which will allow users to select which type of allocation to use.

This method would require no change in existing code as this would be a feature of the compiler only. Also this would mean no fragmentation of memory, so faster access :-)

So tell me what you think of the idea. :-)
Posted on 2002-11-08 01:05:33 by clippy
If you are worried about allocation strategies in terms of speed, simply design your own. There is no problem in allocating a large block of memory and controlling access to it yourself. It means that the OS is not holding your hand but if you know what you are doing, a fixed block of memory that you control pointers to cannot be beaten in terms of speed.

Regards,

hutch@movsd.com
Posted on 2002-11-08 01:56:01 by hutch--
Why are you worry about speed ?
malloc + free routines only take 2-3 microseconds on my p133 , is not fatster enough?
about using fixed memory block size: the memory blocks are then limited in size
and you will waste a lot of memory, so virtual memory will be used....
Posted on 2002-11-08 12:52:11 by octavio
If objects in a "simulated" heap have pointers to each other, compaction requires these pointers to be adjusted to point to new locations. (A "simulated" heap is actually a real heap.) It also means any pointers into the heap will need to be adjusted.
Posted on 2002-11-08 17:59:26 by tenkey
Anyway, i just thought of another solution for optimizing the normal heap allocation method.

Instead of keeping one linked list for the memory, we keep 2.
One for the allocated memory, and the other for the deallocated memory. The 2nd one in the form of a tree, according to the size of the memory.

Once a block of memory is deleted, we remove its reference from the list of allocated memory and put it in the deallocated memory tree.

We keep 2 variables, max and min for the maximum and minimum block sizes of the deallocated memory tree.

Now once you want to allocate more memory of size say MemSize.
We first compare it to the max and min values to see if the amount has already been deallocated and is in the tree. If so, then we search through the tree for an appropriate size and allocate it and move that node to the list of allocated memory.

If the size of the mem is not between the max and min values then we just allocate memory of top of the existing list.

This way there would be no need to go through the entire list each time.
Also fragmentation would be MUCH less as each time we would always find a "best fit" hole , compared to the first fit hole which is usually searched for.
Also as everything is in a tree, searching for the hole would also be blazingly fast.

So what do u think?


tenkey,
If objects in a "simulated" heap have pointers to each other, compaction requires these pointers to be adjusted to point to new locations. (A "simulated" heap is actually a real heap.) It also means any pointers into the heap will need to be adjusted.

Yes, i thought about it much after i wrote my post, which is whyi thought of the above method to optimize the exising style of allocation.
But back to my previous method and what you said-
wont that mean, going through each and every object and checking its pointer and changing it?
.NET does this, but i read somewhere that the full garbage allocation of generation 0 objects in .NET takes only about 1 millisecond on a pentium-200 mhz machine.
How does it achieve so much speed? Because even a medium sized program can have LOTS of objects in memory. Wont going through each one to check for pointers take a LOT of time?
Posted on 2002-11-09 01:01:47 by clippy
gladiator, in the 'deallocated memory tree' how would sorting be done on same size blocks? How would the tree colapse when contigious blocks become availible? ...or would it?
Posted on 2002-11-09 01:23:25 by bitRAKE
bitrake,
Probably i am getting what you mean, but i will try to answer what i understand. But pls remember that the solution i am posting now is just a quick solution and can probably be optimized much,much more while actually writing the code.



in the 'deallocated memory tree' how would sorting be done on same size blocks?

A quick solution is to keep another list in the tree of deallocated memory of same sizes. So we have a structure like-
struct mem{

int memsize;
int nBlocks;
};

Lets say, a user deletes a linked list of 10 elements of size say 20 bytes each.
So the binary tree would have a node of mem which has,
memsize=20, nBlocks=10.
We keep a linked list of blocks of memory deleted.

Now when we need a block of size 20 bytes, we take an element from the linked list of 20 bytes in the deallocated memory tree and put its reference in the 'allocated memory list'.
We decrement nBlocks by1. So nBlocks is now equal to 9. If ever nBlocks goes to 0, we remove the whole 'mem' reference from the tree, meaning 20bytes blocks are not available in the tree.

How would the tree colapse when contigious blocks become availible? ...or would it?

I am probably not getting what you mean by this. I have just told you about the method of deleting references from the of blocks of the same size. If you want to completly remove a reference of 'mem' structure then it would be using the normal way of deleting nodes from a binary tree.

P.S.- I dont think i need to mention this again but, the tree will be built according to the size of the blocks, ie, 'mem.memsize'.

I hope i have manged to explain you properly what i mean to say. Hope i have not confused you more by overspecifying things.

So again then, What do u think of the idea?
Posted on 2002-11-09 02:46:27 by clippy
gladiator, I think the idea is good. Yes, I did not understand the trees were size based. :)
Posted on 2002-11-09 03:29:27 by bitRAKE
What is missing in this discussion is what the use will be as that will dictate what will be the most efficient way to allocate memory.

garbage collection is originally from basic and it has to do with a particular style of dynamically allocated memory, usually string memory.

What it sounds like is a lot of small allocations of different sizes that are changing on a regular basis. Now what needs to be worked out is if this is the most efficient way to manage memory in the particular instance.

Choices would be,

OLE string memory, it is designed for rapidly changing small allocations up to about 250 meg.

Allocating a single block and managing the pointers yourself and this could be done with a preset maximum size if the app design allowed it.

Once a single contiguous block of memory is allocated, the speed of access to an address is much the same with any function used to call it, about the only thing that is a bit slower is memory mapped files and this is only in the allocation.

Regards,

hutch@movsd.com
Posted on 2002-11-09 04:00:41 by hutch--
hutch,
about the single block of memory. I had edited my last to last post and had asked this question-
If you have a large block of memory and there are a lot of variables and then if you compact the memory making everything contiguos, wont it take a long, long time to scan through each pointer and correct what its pointing to?
But how does .NET manage to do it so fast? I read somewhere that it can complete full generation 0 garbage collection(including compacting the memory) within 1 millisecond on a p-200 mhz machine.


One more thing,
I have found a better algorithm for heap memory management but i still dont know where to implement it?
Should it be done in the malloc function or somewhere else? When i use "new" in C++ does it call malloc?
My knowledge of asm is very basic, so i want to know how do u allocate memory on the heap in asm, so that , that function can be optimized too?
Posted on 2002-11-09 05:57:33 by clippy
It still has a lot to do with the task you have in mind. If you have a large number of variables that are being used and then discarded, what you need to do is work out what the maximum at any time will be and then what the average of memory usage will be and see if you can work out a viable block size to start with.

If they are predictable size variables, you can simply reassign the memory locations and ypou have no compaction problems at all. Where it can become problematic is if you are using a large number of variable length strings at the same time.

If this was the case, you could try and work out a strategy for the most used to the least used and allocate on that basis. The general idea is that if memory allocation time is a problem for you in performance terms, the more predictably you can allocate larger blocks to cut down the overhead time, the less the overhead will interfere with your programs performance. What it means is that you will have to manage the addressing within those blocks.

What you are contrasting is the difference between allocating a large number of small blocks as against allocating a smaller number of large ones that you manage yourself.

Regards,

hutch@movsd.com
Posted on 2002-11-09 15:45:46 by hutch--
providing your own malloc+calloc+free and "operator new" and "operator delete"
should handle "most cases" - but might not catch everything. You should try having
a look at Paul Nettles mmgr library; it's designed for finding memory leaks (and does
a good job at this), and shows how to override memory allocation routines. Nice thing
is that this can be done without any compiler changes, just library changes; you'll
probably need source code recompile though, and there's a few odd problems (any
standard/3rdparty library functions using memory allocation...)

Thing is... do you need this? for me, it would make most sense to use standard
library functions most of the time, since memory allocation hasn't really been a
time-critical thing for me yet (no, I'm not into RealTime programming, at least not
yet ;)). There might be cases where you have other requirements though, like
a zillion tiny allocations... so, write custom code for these occasions :).

I don't know if there's any "generic best way" to allocate memory; there's
speed or size considerations. Like, you might construct tables for free-block
tracking that will allow you to quickly find an appropriately sized memory
block on alloc, but this requires more memory, and has some overhead in alloc
and free.

When writing "tiny application", I wrap most of the libc stuff I use directly
to win32 api functions (malloc->HeapAlloc, etc) and this works fine for these
type of apps. For 32bit dos coding, I had a simplistic linked-list heap scheme,
which worked fine for generic allocations. I can't remember how microsoft libc
handles memory, but I think it was somewhat more spiffy than a simple linked
list. One version of GNU LIBC I looked at years ago (djgpp coding) had something
fancy-looking with hash buckets and whatnot...

nettles' mmgr, have a look here: http://www.fluidstudios.com/publications.html
Posted on 2002-11-10 08:20:34 by f0dder