doby,

It very much depends on how big the file you want to load is. If its something you can safely fit into available memory, then you can load it into a memory buffer and process it whatever way you wish.

If it is much larger with somethiong like a database file which can be gigabytes in size, you will need to have some paging mechanism to manipulate the data.

If you work out what is the safest buffer size you can run, then you can load the file in bits using the normal file IO APIs and handle it that way.

Japheth,
===========
- MapViewOfFile will enlarge Paging File by the number of bytes to map, regardless how much free physical memory is in system - it will reduce shared memory area in Win9x systems (which is "only" 1 GB)
===========

Sure, it must come from somewhere but the documentation says that MMFs are backed up by the system paging mechanism. I have yet to see a memory allocation method that happends by magic, it must always come from physical memory or disk somewhere.

==============
- it will reduce shared memory area in Win9x systems (which is "only" 1 GB)
==============

As win95/98 can only use under 1 gig of physical memory (Tested on 2 of my boxes as per the documentation) such a restriction is vacuous of content. I run 768 meg on both win9x machines.

f0dder,
==============
refer to "inside microsoft windows 2000" for more details.
==============

The problem with this approach among others is that a published interface for a function is not and does not have to be implimented in the same way in each OS version. GlobalAlloc() was available in win 3.0 in 1990 but I can promise you that it does not work the same way in later 32 bit versions so your detailed analysis of win2k is trivial.

Just to drop you a little hint about OLE memory as a system resource, call a function GlobalMemoryStatus() both before and after you allocate 100 meg from the OLE string pool and on win95/98 you will see no difference. The system uses OLE string to do many things, being trapped in a world of zero terminated buffers is like being trapped in a time zone that ended a long time ago.

Like it or lump it, there are many ways of accessing memory in 32 bit windows and each method has its advantages and vices, a predisposition to one method without comprehending the differences is a mistake.

Regards,

hutch@movsd.com
Posted on 2002-06-29 23:51:23 by hutch--
hi hutch,

the file size too large, can not load the whole file into memory.

in my case, the speed is very critical, i really need the speed as high as possible.

so, my question is that in the following two way which one is better.
1) using MMF (i understand that MMF implement page files for me)
2) load some part in to mem. as a page file, if the program has the cache miss, then go to get the new one from the real file and replace it to the part in mem. by some algorithm. (doing page file by myself)

because the I/O access is very slow, i need to reduce the I/O access as much as possible to increase the speed.

if there is the other way better please give me some suggestion.

or if i misunderstand something, please correct me.

thanks,
doby.
Posted on 2002-06-30 01:19:06 by doby
doby,

============
the file size too large, can not load the whole file into memory.
============

I guess you have no other choice than to allocate a buffer that you think is a safe size for the machine that you wish to run and process the file a section at a time. You could try and use virtual memory but you are then at the mercy of the operating system in how its is paged.

If the file is for example, 100 meg and you can only safely allocate 10 meg in a buffer, then you would proces it at 10 meg at a time. A buffer size this big would solve any problems of slowing down disk IO.

I would have a look at the file APIs to see about either buffering the read or the write to disk but not both. The logic here is to buffer read and direct write or direct read and buffered write.

You may have to flush the file buffer after each write so that you don't overload the file buffer.

Regards,

hutch@movsd.com
Posted on 2002-06-30 04:14:33 by hutch--
Just played around with some of the alloc functions mentioned here. Seems there isn't much difference between LocalAlloc (+ GlobalAlloc, HeapAlloc), CoTaskMemAlloc and SysAllocString. They all alloc committed memory in private address space and are shown with Heap32Next function (so they are normal heap items)

Thats my little test proc (cant be compiled without changes Im afraid because I used some CRT functions)



.386
.Model flat,stdcall
option casemap:none

include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
include \masm32\include\ole32.inc
include \masm32\include\oleaut32.inc
include debugout.inc
include crt.mac

MEMSIZE equ 7000000h ;112 MB

?SYSALLOC equ 0
?COTASK equ 0
?NORMAL equ 1

.code

main proc c

invoke CoInitialize, NULL

invoke printf,CStr(<"press any key to start...",13,10>)
invoke _getch

if ?SYSALLOC
invoke SysAllocStringByteLen, NULL, MEMSIZE
mov esi,eax
invoke printf, CStr(<"SysAllocStringByteLen returned %X",13,10>),eax
endif
if ?COTASK
invoke CoTaskMemAlloc, MEMSIZE
mov esi,eax
invoke printf, CStr(<"CoTaskMemAlloc returned %X",13,10>),eax
endif
if ?NORMAL
invoke LocalAlloc, LMEM_FIXED, MEMSIZE
mov esi,eax
invoke printf, CStr(<"LocalAlloc returned %X",13,10>),eax
endif

invoke printf,CStr(<"press any key to stop...",13,10>)
invoke _getch

invoke CoUninitialize
ret

main endp

end


Tested with Win98 and WinXP

Since memory is just allocated but not touched by this app, only "committed memory" is increased. "Free physical memory" dont change and even "Paging file in use" sometimes remains unchanged. Seem that Windows just checks that "total size of committed memory" doesn't exceed "total size of physical memory" + "max paging file size".

As a consequence: you can't be sure that you can alloc 100 MB if "free physical memory" indicated 128 MB.

japheth
Posted on 2002-06-30 05:53:29 by japheth

Sure, it must come from somewhere but the documentation says that MMFs
are backed up by the system paging mechanism. I have yet to see a memory
allocation method that happends by magic, it must always come from physical
memory or disk somewhere.

Of course you don't get anything for free - but expanding the pagefile all
at once when an allocation is done... isn't really necessary. And yes, this
*does* happen on 9x, I just verified it on my kid brothers' 98se box. Right
after the MMF allocation is done, the paging file increases to ~512 megabyte.
On NT, the pagefile isn't increased until necessary. So, I would suggest that
using memory mapped files for large allocations isn't a good idea on 9x.


As win95/98 can only use under 1 gig of physical memory (Tested on 2 of my
boxes as per the documentation) such a restriction is vacuous of content.
I run 768 meg on both win9x machines.

It actually *is* quite a limitation. Why? Well, you can't map two 600meg
files at once, for instance. You will have to map a smaller view of the
files, meaning more code et cetera.


The problem with this approach among others is that a published interface for a
function is not and does not have to be implimented in the same way in each OS
version. GlobalAlloc() was available in win 3.0 in 1990 but I can promise you
that it does not work the same way in later 32 bit versions so your detailed
analysis of win2k is trivial.

Of course the implementation can change... but there are fundamental things
about memory mapped files that just don't change, at least not unless the
CPU changes as well. In case you don't believe me, you should read up on
paging in the intel system programmers manual.


Like it or lump it, there are many ways of accessing memory in 32 bit windows
and each method has its advantages and vices, a predisposition to one method
without comprehending the differences is a mistake.

Ahem. I am merely pointing out why you shouldn't use memory mapped files for
generic memory allocations. It's about using the right tool for the right job,
and memory mapped files just isn't the right tool for generic memory allocations.
I believe I have given enough reasons why.

Doby, if speed is very critical, do not use memory mapping to handle your file
(the exception could be if you have an algorithm that works on one big buffer
and would lose a lot of speed being converted to 'chunked' code). In almost all
situations, ReadFile on blocks of code will be faster than memory mapped files,
as you avoid the numerous pagefaults of MMF. While the bottleneck in file I/O
is usually the harddrive speed, you *will* be able to feel the impact of MMF.


You could try and use virtual memory but you are then at the mercy of the
operating system in how its is paged.

All ring3 memory management (ie, the stuff you can do with the WIN32 API) is
'virtual memory'. Any memory can be discarded or paged out. Like it or not,
but that's how it is. It's a good reason to think about your memory allocation
strategies to avoid discarding/paging.


I would have a look at the file APIs to see about either buffering the read or
the write to disk but not both. The logic here is to buffer read and direct write
or direct read and buffered write.
You may have to flush the file buffer after each write so that you don't overload
the file buffer.

Unless you do this *very* carefully, you'll probably end up with worse speed than
normal Read/WriteFile (without any "special flags" or calls to flush). I don't say
you can't get it better with 'special code', but you'd better check the speed with
and without, and on a multitude of systems. And of course everything depends on
what exactly you're doing...

By the way japheth, perhaps you can answer this... even when committed, pages aren't
actually taken from the physical pool and zeroed out before first use, are they?
Zeroing out a 512meg allocation would take at least *some* time, but VirtualAlloc
seems to return immediately - and VirtualAlloc *does* guarantee you that the pages
will be zeroed... obviously the "just in time" zeroing (and physical allocation?)
is not done per-page or you'd have even worse speed than memory mapped files, but
do you have any idea how it's done? Pages marked not present or read-only? Allocating
in 64k regions, or more, or less, or with some other heuristic? Guess I should look
in 'inside windows 2000' and see if I can find the answers - I seem to remember something
about only VADs being reserved until actual use...
Posted on 2002-06-30 10:17:59 by f0dder
f0dder, my little tests have prooved (at least for me :) ): All the standard memory allocation functions (except VirtualAlloc which I havent tested) in Win9x do enlarge the pageing file too. Whereas the free page pool remains more or less unchanged until the memory is touched (havent used XXX_ZEROINIT in my tests). In consequence that means that physical memory will be assigned thru page faults too like with MMF. So the one main difference compared to MMF may be that it is in the shared region. May be enlarging the pageing file isn't a costly operation so this strategy makes sense. (And in this point NT and Win9x really differ because paging file size in Win9x is dynamic).
Posted on 2002-06-30 11:07:59 by japheth

In consequence that means that physical memory will be assigned thru
page faults too like with MMF.

Yes... it seems that memory allocations generate "demand-zero" paging,
ie the pages will not be actually allocated and zero-filled until they
need using. inside win2k says this is done in batches, which makes sense.
The thing that puzzles me a bit is why MMF allocated memory is so much
slower than 'normal' memory allocation, since both of them depend on
pagefaults... perhaps there's more checks involved with mapped memory?
Or perhaps the 'batch' size is smaller for MMF, causing more PFs?


May be enlarging the pageing file isn't a costly operation

It is :). There was "some" disk I/O activity involved on 9x when
allocating a large buffer, and I got the popup about "drive is
running low on space".


(And in this point NT and Win9x really differ because paging file
size in Win9x is dynamic).

Hmm, it's dynamic on NT as well - I have min/max sizes for my pagefile.
Or perhaps you mean it's dynamic in some other way?
Posted on 2002-06-30 11:18:27 by f0dder
hmmmm,

============
Ahem. I am merely pointing out why you shouldn't use memory mapped files for generic memory allocations. It's about using the right tool for the right job, and memory mapped files just isn't the right tool for generic memory allocations. I believe I have given enough reasons why.
============

You seem to have missed the point here, your notion of the right tool for the job may vary considerably from someone elses.

The person who needs a single large block of memory to read and write to that is sharable if necessary has MMFs at their disposal.

If memory granularity is what you are worried about, use memory allocation functions that have a fine enough granularity to start with, GlobalAlloc() and OLE string both handle large numbers of small allocations better than functions like VirtualAlloc().

Having noted that the tests presented here by Japheth support the standard documentation on memory allocation, I am yet to see the big deal about preferring VirtualAlloc() over all other memory allocation functions when it is a bad choice in many normal applications, particularly when a large number of smaller allocations are needed.

For large single blocks, MMFs work well and I guess thats why the inteface has been made available. It finally does not matter if you understand how to use them or not, it is an interface that was published about 7 years ago with the introduction of win95oem and it has been widely used by countless numbers of programmers since.

Fortunately the world at large does not have to conform to the world according to f0dder and it just gets on with writing successful applications/drivers/dlls/operating systems etc .... :tongue:

regards,

hutch@movsd.com
Posted on 2002-06-30 21:44:17 by hutch--
Once again you have chosen to misinterpret what I have written. If you
want to reply to this, please read through the following post carefully.
I have tried to state my views very clearly this time, to avoid
misunderstandings along the lines of "...preferring VirtualAlloc() over
all other memory allocation functions...".

I don't say you shouldn't use MMF for shared memory, as that is one of
the main points of using MMF - and one of the few ways (if not the only
way?) of allocating shared memory with the WIN32 API. I am, however,
stating that it's not good for *generic* allocations, as the memory
access is slower, and you use (on 9x) address space from the shared
memory region. I hope you aren't saying that memory access to MMF isn't
slower than 'ordinary' access, because it is. I have empirical and
theoretical data to back up this claim.

Furthermore I don't say you should use VirtualAlloc instead of HeapAlloc,
OLE memory, or whatever... I say that VirtualAlloc is good for large
allocations where page alignment can be a nice thing, and that it is a
nice way to allocate memory when you need certain page protection. I think
I even stated that it would be foolish to use VirtualAlloc for lots of small
allocations - if not, I have done so now. I've never claimed that
VirtualAlloc is the holy grail, it has characteristics that makes it less
generic-purpose than eg HeapAlloc, namely the 4k granularity of allocations,
and the inability to reallocate the memory blocks. It is, however, more
generic purpose than MMF - valloc has alignment and granularity equal to
MMF, but will allocate from private pools even on 9x, and offers a bit
more page-level protection flexibility than MMF.

I still believe you shouldn't use the SysAllocString for generic memory
allocation. Why? Because that function family is designed for working
with BSTRs (length-prefixed unicode strings to those who wonder). Sure,
you can use SysAllocStringByteLen with the string argument set to NULL
to get a chunk of memory... but I'd advice anybody thinking about doing
this to take a look at the code path of this function.
Furthermore, when you can use HeapAlloc (which, after all, was designed
with versatile memory allocation in mind), why "hack" around with string
allocation functions to emulate "normal" memory allocation? I fail to see
any clear advantage in doing this, unless it turns out that allocation and
deallocation speeds are faster than the Heap* family. SysAllocString*
requires you to map in oleaut32.dll (and thus ole32.dll and a bunch of
other DLLs), and while this isn't too bad (the DLLs ought to be already
loaded by other processes), it does mean a little extra memory is wasted
on the additional page table entries. Furthermore, SysAllocString* has
long code paths, and internally depend on Co* functions (Co* == COM, right?).

I think I'm going to do testing of allocation/deallocation speed of the
various memory allocation methods within the next few days, as these ought
to vary a lot more than the actual data access speed of the memory.
As I already mentioned, the code path for SysAllocStringByteLen is *long*,
and I doubt it will be faster than HeapAlloc. I believe hutch talked about
SysAllocString* being 'preallocated' from 'string pools' (correct me if I'm
wrong, but that's how I read your post). However, the address returned by
this function lies in the same range as that returned by VirtualAlloc,
HeapAlloc (and all the other ordinary private memory functions),
so the idea that this memory is 'preallocated' doesn't seem very likely to me.
The addresses my 256meg allocations gave were the following (note that the
addresses are the same each time the program is run, at least as long as
there haven't been any major memory-related changes in the system):


VirtualAlloc: 0AC70000
HeapAlloc: 0AC70020
SysAllocStringByteLen: 0AC70024

Of course 'preallocated' could mean the system keeps a separate pool of
zero-inited pages (ie pages that can be give directly to a usermode memory
allocation request because the kernel knows the state of these pages to be
clean and zeroed), but I doubt this.

Please note that I don't discourage people from using the SysAllocString*
API family, it does seem to be fine and dandy when you're working with BSTRs.
I just discourage the use of it for 'generic' memory allocation. Also I believe
it could be advantagous to code your own BSTR handling routines - there seems to
be an awful lot of code in the SysAllocString* family for what it does.

Btw, SysAllocStringByteLen has the same access speed as the HeapAlloc and
friends, so it seems to me like it's just ordinary demand-zero memory allocation.
Matters would of course be different if you use a non-NULL string argument, as
the memory is then initialized... but then you have longer setup time, and you
can't specify that you want a memory allocation larger than your source string.

Obviously the internals of the various functions can be changed by microsoft as
long as the interface keeps the same, and the functions keep on doing what
PlatformSDK says they do. Also there are probably some differences between the
way 9x and NT handles things - which is why I plan to do testing on both, and
advice people to do their own tests as well. But even though you should treat
the win32 API as a "black box", I do believe it's advantagous to examine the
implementation... if this leads to the discovery that two seemingly identical
functions have different performance, well, then you can use this knowledge
to choose the fastest of the two and be happy.
While the implementations might change in the next version of windows, or even
in the next service pack, there are certain characteristics that are bound to
stay the same... VirtualAlloc and MMF will have higher granularity and better
alignment than Heap*, VirtualAlloc will allow more pagelevel protection flags
than the other routines, MMF is very likely to remain slower than the rest,
and so on.

Let's try and be technical instead of playing with words politician-style.
This is the win32asmboard, and I believe the idea is to have good performance,
know what you are talking about, and choosing the right tool for the job... no?
Posted on 2002-06-30 22:44:55 by f0dder
Hutch and f0dder....you guys are certainly keeping me entertained. I understand perhaps 20% of what you are saying because i'm a newbie but i find myself always checking email when i log on just to see if there is anything new in this thread.

You both have my applaud.

:alright:
Posted on 2002-06-30 22:47:03 by IwasTitan
Titan: I just hope you will choose to do your own experimentation
rather than trust any of us blindly. And that you will take more heed
of well-founded technical material than simple word twisting.
Posted on 2002-06-30 22:49:11 by f0dder
Since this is such an interesting thing I have extended my little test prog to measure time (very simple though). The results for Win98 are a bit surprizing:

- VirtualAlloc and MapViewOfFile are almost equally fast
- LocalAlloc, CoTaskMemAlloc have a 10% performance penalty
- SysAlloc... has a 15 % performance penalty

This is only true if no disk activity results from the request. But increasing the paging file size doesn't definitely mean there is disk activity.

Thats my test prog (changed to GUI now to avoid instance faults from VM switching in console apps):



;/* dont delete, needed to make RC ignore the ASM lines

;--- Win32 alloc test

.586
.Model flat,stdcall
option casemap:none

include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
include \masm32\include\ole32.inc
include \masm32\include\oleaut32.inc

IDD_DIALOG1 equ 101
IDC_LIST1 equ 1000
IDC_NORMAL equ 1001
IDC_COALLOC equ 1002
IDC_SYSALLOC equ 1003
IDC_VIRTALLOC equ 1004
IDC_ZEROFILL equ 1005
IDC_MAPVIEW equ 1006

MEMSIZE equ 7000000h ;112 MB

ListBox_AddString macro x,y
invoke SendMessage, x, LB_ADDSTRING, 0, y
endm

CStr macro y:req
local sym
.const
ifidni <y>,<"">
sym db 0
else
sym db y,0
endif
.code
exitm <offset sym>
endm

.data

g_handle DWORD 0

.code

DoAlloc proc uses esi hWnd:HWND, iMode:DWORD

local hWndLB:HWND
local pszFormat:LPSTR
local tsc:QWORD
local szText[128]:byte

invoke GetDlgItem, hWnd, IDC_LIST1
mov hWndLB,eax

rdtsc
mov dword ptr tsc+0,eax
mov dword ptr tsc+4,edx

mov eax, iMode
.if (eax == IDC_SYSALLOC)
invoke SysAllocStringByteLen, NULL, MEMSIZE
mov pszFormat,CStr("SysAllocStringByteLen returned %X")
.elseif (eax == IDC_COALLOC)
invoke CoTaskMemAlloc, MEMSIZE
mov pszFormat,CStr("CoTaskMemAlloc returned %X")
.elseif (eax == IDC_VIRTALLOC)
invoke VirtualAlloc, NULL, MEMSIZE, MEM_COMMIT, PAGE_READWRITE
mov pszFormat,CStr("VirtualAlloc returned %X")
.elseif (eax == IDC_MAPVIEW)
invoke CreateFileMapping, -1, NULL, PAGE_READWRITE, 0, MEMSIZE, NULL
.if (eax != NULL)
invoke MapViewOfFile, eax, FILE_MAP_WRITE, 0, 0, MEMSIZE
mov pszFormat,CStr("MapViewOfFile returned %X")
.else
mov pszFormat,CStr("CreateFileMapping returned %X")
.endif
.else
invoke LocalAlloc, LMEM_FIXED, MEMSIZE
mov pszFormat,CStr("LocalAlloc returned %X")
.endif
mov g_handle,eax

rdtsc
sub eax,dword ptr tsc+0
sbb edx,dword ptr tsc+4
push eax
push edx

invoke wsprintf, addr szText, pszFormat, g_handle
ListBox_AddString hWndLB, addr szText

pop edx
pop eax

.if (edx)
invoke wsprintf, addr szText, CStr("Time was %X%08X"),edx,eax
.else
invoke wsprintf, addr szText, CStr("Time was %X"),eax
.endif
ListBox_AddString hWndLB, addr szText

ret
DoAlloc endp

dlgproc proc hWnd:HWND, message:DWORD, wParam:WPARAM, lParam:LPARAM

mov eax,message
.if (eax == WM_INITDIALOG)
mov eax,1
.elseif (eax == WM_CLOSE)
invoke EndDialog, hWnd, 0
.elseif (eax == WM_COMMAND)
movzx eax,word ptr wParam+0
.if (eax == IDCANCEL)
invoke EndDialog, hWnd, 0
.elseif (eax == IDC_LIST1)

.elseif (eax == IDC_ZEROFILL)
.if (g_handle)
push edi
mov edi,g_handle
mov ecx,MEMSIZE/4
xor eax,eax
rep stosd
pop edi
.endif

.else
invoke DoAlloc, hWnd, eax
.endif
xor eax,eax
.else
xor eax,eax
.endif
ret
dlgproc endp

WinMain proc hInstance:HINSTANCE, hPrevInst:HINSTANCE, lpszCmdLine:LPSTR,iCmdShow:sdword

invoke CoInitialize, NULL
invoke DialogBoxParam, hInstance, IDD_DIALOG1, 0, dlgproc, 0
invoke CoUninitialize
ret

WinMain endp

WinMainCRTStartup proc public
invoke GetModuleHandle, NULL
invoke WinMain, eax, 0, 0, 0
invoke ExitProcess, eax
WinMainCRTStartup endp

end
;*/

#include "\masm32\include\resource.h"

#define IDD_DIALOG1 101
#define IDC_LIST1 1000
#define IDC_NORMAL 1001
#define IDC_COALLOC 1002
#define IDC_SYSALLOC 1003
#define IDC_VIRTALLOC 1004
#define IDC_ZEROFILL 1005
#define IDC_MAPVIEW 1006


/////////////////////////////////////////////////////////////////////////////
// Dialog

IDD_DIALOG1 DIALOG DISCARDABLE 0, 0, 188, 216
STYLE DS_MODALFRAME | DS_CENTER | WS_POPUP | WS_CAPTION | WS_SYSMENU
CAPTION "Alloc Test"
FONT 8, "MS Sans Serif"
BEGIN
PUSHBUTTON "VirtualAlloc",IDC_VIRTALLOC,7,172,50,14
PUSHBUTTON "LocalAlloc",IDC_NORMAL,64,173,50,14
PUSHBUTTON "CoTaskMemAlloc",IDC_COALLOC,121,174,60,14
PUSHBUTTON "SysAllocString",IDC_SYSALLOC,7,195,50,14
PUSHBUTTON "ZeroFill",IDC_ZEROFILL,121,195,60,14
LISTBOX IDC_LIST1,7,7,174,159,LBS_NOINTEGRALHEIGHT | WS_VSCROLL |
WS_TABSTOP
PUSHBUTTON "MapViewOfFile",IDC_MAPVIEW,64,195,50,14
END

/* extract following lines to file TEST.MAK.

NAME=test

ASM=\masm32\bin\ml -c -coff
LINK=\masm32\bin\link
RC=\masm32\bin\rc
LOPTS= /LIBPATH:\masm32\lib /SUBSYSTEM:WINDOWS
LIBS=kernel32.lib user32.lib ole32.lib oleaut32.lib

$(NAME).exe: $*.obj $*.mak $*.res
$(LINK) $*.obj $*.res $(LOPTS) $(LIBS)

$(NAME).obj: $*.asm $*.mak
$(ASM) $*.asm

$(NAME).res: $*.asm $*.mak
$(RC) $*.asm
*/


And for checking of page faults and pageing file size I have used

Click to download which works regretably for Win9x only.
Posted on 2002-07-01 04:05:36 by japheth
I have just posted a tool in the MASM32 forum for testing different types of memory that are documented in the Windows API calls.

The test was to determine half the available physical memory up to a limit of 100 meg.

Selection of the memory allocation type runs a test of allocating the amount of memory, filling it with zeros and displaying the results in milliseconds. It deallocates the memory on exit from the test procedure.

Each test procedure has the same setup and overhead and is clocked locally to ensure that other factors in the code do not effect the time.

On my win95b, they all go past the post at very close to the same times, so close that the times overlap, the "MapViewOfFile" method uses about 1 megabyte more memory which is accounted for by its overhead and usage.

Not tested on NT/2k/XP. (shrug) Could not be bothered turning the NT box on.

LATER: Tested on NT4 sp6a on my old AMD. Results much the same, the display of Percentage Used fails but the tests run OK.

Do the numbers speak falsely ? I doubt it.

f0dder,

Perhaps if you concentrated more on mammary than memory you would do better. :tongue:

Regards,

hutch@movsd.com
Posted on 2002-07-01 04:47:02 by hutch--
I modified my benchmark to have the ability to do a full
memory fill instead of just per-page touching, as that
*is* usually a more realistic usage pattern of the memory.
It is clear that the overhead of MMF and static allocation
becomes smaller this way, for obvious reasons: there will
be the same amount of pagefaults, but more time is spent
per page, so it doesn't feel as bad. It is however clear
that MMF and static memory still is slower to access than
the other memory allocation methods. 40ms might not seem
too bad for a 256meg fill, but the speed difference is
there nonetheless.

I am still timing the memory access separately from the
time taken for allocating/deallocating the memory; testing
will be done on alloc/dealloc speed later, and with other
parameters (lots of small blocks instead of one large).

Note that I realized my "bigmem.nas" had a flaw, namely
two 0x5000000 instead of two 0x8000000 - fixed now.

The tests were run 5 times, and the mean value was chosen.
There weren't big fluctuations between the various values
(max ~10ms), and usually only one or two runs were off.

athlon700, 512megs of ram, win2k-sp2 - buffersize set at 256 megs


VirtualAlloc 1032ms
HeapAlloc 1032ms
MMF 1072ms
static 1072ms
CoTaskMemAlloc 1032ms
GlobalAlloc 1032ms
SysAllocStringByteLen 1032ms


Now, the results on the 98 box were rather interesting :)
All routines clocked around the same average. So, either the
memory allocations were too small to show deviations, or
the memory access speed on 9x is the same for all types.
It is wellknown that 9x and NT are rather different in how
they handle stuff, so the latter is probably the cause.

k6-2 333mhz, 64megs of ram, win98SE - buffersize set at 64 megs


VirtualAlloc 1650ms
HeapAlloc 1650ms
MMF 1650ms
static 1650ms
CoTaskMemAlloc 1650ms
GlobalAlloc 1650ms
SysAllocStringByteLen 1650ms


However, I still do discourage people from using MMF as
a generic memory allocation method when shared memory is not
needed... on 9x, the memory is allocated from the shared
region (limited resource), which also meansa buffer overrun
can be a lot more severe than from a private pool. And the
memory access *is* slower on NT. Sure, the speed hit should
only be there on first access (or if the pages need to be
swapped out later), but it's there nonetheless.

More information will be posted when I have tested the
allocation/deallocation speed of the various routines.
I expect it will be hard seeing too much speed difference
between allocation/deallocation speed of the various routines
(except that MMF will probably have somewhat higher overhead),
but we'll see.
Posted on 2002-07-01 08:54:51 by f0dder
To be able to compare these methods fairly, they will have to be
put into two categories. The first category gives so-called "0page"
memory - this means the memory blocks are guaranteed to be zero-filled
(demand-0, not "zero when allocated"). The second group is uninitialized
and can contain arbitrary data. Taking this into account, it's not too
strange that CoTaskMemAlloc and SysAllocStringByteLen are faster than
the other routines.

Test parameters:
NUMBLOCKS equ <1024*16>
BLOCKSIZE equ <4096>
Athlon700, 512meg ram, win2k-sp2
Results:
;##### 0page ################
VirtualAlloc: 800
HeapAlloc: 340
GlobalAlloc: 340
MMF: 1161
;##### Uninit ###############
HeapAlloc w/o ZERO_MEM: 150
CoTaskMemAlloc: 150
SysAllocStringByteLen: 130
f0dderCoMalloc: 150

There's a couple of things, however, that does surprise me. First,
the rather slow speed of VirtualAlloc - since it's a rather 'primitive'
function I had assumed it would be fast. Ok, I was wrong - so I guess
you should only use VirtualAlloc if you need the alignment or page
protection flags (or address space reservation) it provides, not just
for any "huge buffer" allocations. I will still be using it myself for
eg framebuffer allocation.

It initially surprised me that SysAllocStringByteLen is faster than
CoTaskMemAlloc, as both use COM memory allocation routines (the
IMalloc interface, retrieved by CoGetMalloc). I discovered the cause
while writing "f0dderCoMalloc" (a small piece of code that uses the
IMalloc interface (calls CoGetMalloc once at program startup,
and uses this object later)). It is identical in functionality to
CoTaskMemAlloc, and was included so those without disassembling or
tracing capabilities can see how CoTaskMemAlloc works.

SysAllocStringByteLen has quite some code. As far as I can see, it
allocates a IMalloc object per thread. It calls TlsGetValue to get
the ppmalloc pointer; it returnvalue is zero it calls a routine to
allocate a IMalloc interface (and do a bunch of other stuff), and
will save this value with TlsSetValue.

The memory allocation size is adjusted ((allocsize + 0x15) & 0x0FFFFFFF0),
and this is what causes the speed difference - if I adjust the memory
size like this for CoTaskMemAlloc or f0dderCoMalloc, I get identical
speeds.

When SysAllocStringByteLen is done allocating memory, it checks if the
'len' parameter is zero - if not, it copies the source string. Finally
it zero-terminates the string. It also stores the string length in the
memory block (four bytes before the returned pointer).

It seems that SysAllocStringByteLen keeps 3 pools with 6 entries each,
for blocks of <=0x20, <=0x40 and <=0x100 for faster string allocations.
The pools are initially empty, but when you SysFreeString they will be
filled rather than freeing the string right away.

Sure, this will make later allocations faster, but if you aren't dealing
with strings, I'd say it's advantagous to code your own similar scheme,
which can skip all the Tls stuff, various string-related checks, string
copying et cetera. And furthermore you can set the pool sizes to get
optimal performance for the task at hand.

So, what should one use? HeapAlloc seems a good choice for generic
allocations, as it can provide both uninitialized and demand-zero memory.
It has same speed as similar functions, on NT it maps directly to
NTDLL.RtlAllocateHeap (without parameter conversion stages of Global/LocalAlloc).

The SysAllocString* family seems nice if you work with BSTRs, as they
handle a lot of stuff for you - but it might be beneficial to code your
own BSTR handling routines as you can probably do the string copying
etc faster.

Don't see much reason to use CoTaskMemAlloc or the IMalloc interface
instead of HeapAlloc, but there might be some reason I have missed -
I should probably do further testing with different block sizes...
I don't think there will be much difference, but time will tell.

Time to go do some testing on 9x :).
Posted on 2002-07-01 12:27:25 by f0dder
Compliments f0dder, you have done some good research work here.

Still, there ain't nothing wrong with mammary. :tongue:

Regards,

hutch@movsd.com
Posted on 2002-07-01 13:03:14 by hutch--
Here's the tests from my kid brothers' 98SE box. 64 megs of ram,
k6-2 333mhz. Note that VirtualAlloc is by far the fastest function
here - I assume results would be different if I had allocated only
about half of available system memory.



virtualalloc: 515
HeapAlloc: 2880
HeapAlloc zero: 3350
MMF: didn't complete... too slow.
CoTaskMemAlloc: 2880
GlobalAlloc: 3400
SysAlloc##: 3025
f0dderCoMalloc: 2880


No, nothing wrong with mammary, but that is hardly related to
this subject in any way.
Posted on 2002-07-01 13:12:02 by f0dder
Guess I ought to post the updated test program as well.
I am in the process of preparing an essay about this whole memory
allocation thing and some technical explanations as to what is
going on - will hopefully be done in a couple of days; I ain't doing
much else before thursday where me and a couple of mates are
going sailing (if the weather permits).
Posted on 2002-07-01 13:14:06 by f0dder
Grin,

==============
No, nothing wrong with mammary, but that is hardly related to this subject in any way.
==============

The secret is in the pronunciation, if you get it right, you could sound like a MAC user. :grin:

Regards,

hutch@movsd.com
Posted on 2002-07-01 13:18:43 by hutch--