I've recently ran into a problem with bigfiles.
Basicly im working on a program that can read any file the user specifys and search for other files inside it to extract. However when the user puts a really large file in it say 100meg+ it gets terribly slow. I use GlobalAlloc to read the file into a buffer. Im told this is a very bad idea for large files but im not sure on another way to search a large file?

Posted on 2003-01-12 08:10:52 by Uradox
load it in parts
Posted on 2003-01-12 08:36:30 by Hiroshimator
yes but how? iv never seen some examples on this
Posted on 2003-01-12 10:26:00 by Uradox
just read parts of the file, load it in a memory buffer, do your thing on the buffer, load in the new part.

if you want an example go to the FAQ section, look for md4,md5,ed2k hash. The ed2k hash routine does that on 9MB blocks.
Posted on 2003-01-12 10:29:41 by Hiroshimator
Here's an example in Fasm, the variables FileName & Size need to be declared. This works on 4kb blocks at time, its easy to change that though.
	invoke GlobalAlloc,GPTR,4096

mov edi,eax
mov esi,eax
invoke GetFileSize,eax,Size
mov ebx,eax
sub ebx,4096
js .beta
.alpha: invoke ReadFile,esi,edi,4096,Size,0

; Process the 4096 byte lump here.
; Preserve ebx, edi & esi

sub ebx,4096
jns .alpha
.beta: add ebx,4096
jz .gamma
invoke ReadFile,esi,edi,ebx,Size,0

; Process the last block of size ebx bytes here.
; Preserve ebx, edi & esi

.gamma: invoke CloseHandle,esi
invoke GlobalFree,edi

Fixed stupid stupid mistakes :eek: .
Posted on 2003-01-12 10:57:18 by Eóin
E?in, there is an error is the file is smaller than the block size.
Just need to move the .beta label up two lines.
Posted on 2003-01-12 11:34:05 by bitRAKE
Why not use Memory mapped files ? (see Iczelion's tutorials).
Posted on 2003-01-12 11:55:56 by JCP
Thanks bitRAKE, and I guess noone spotted the GENERIC_WRITE error either :rolleyes: .

Thats where improper testing gets you :grin: .
Posted on 2003-01-12 12:02:35 by Eóin
How can reading a file in 4kb blocks be faster? wouldnt be that a awfull lot more instructions to process?
Maybe im missing the point here
Posted on 2003-01-13 06:41:18 by Uradox
Default block size of Windows file systems is 4096 byte (IIRC). So it is faster because Windows does not need to grab data from various positions on the harddisk on one read :)
Posted on 2003-01-13 06:55:15 by bazik
It may also slow down because when you try to allocate huge chunks of memory, Windows may need to put some of it on the page file, or move other stuff to the page file to make room. Anytime you need to hit the page file, things really slow down...

Posted on 2003-01-13 07:36:13 by S/390
Why not use Memory mapped files
beware of this if you will use files larger than 1 GByte, there are
some problems with mappings larger than 1 GB.
Posted on 2003-01-14 06:33:02 by beaster
I completely agree with Readiosys on this. Memory mapped files in the obvious choice here. It will be much faster both to implement and during execution.

I've never heard about the >1GB problem with Memory Mapped files that beaster mentioned, but if there would be any trouble you can simply map these files in 1GB chunks anyway (you don't have to map the whole file), so that's no problem anyway. :)
Posted on 2003-01-14 07:04:36 by dELTA
you can read about my experiences with mappings here
Posted on 2003-01-15 03:53:19 by beaster
the files larger than 1gig could be a problem i would come across but very rare.....
Did u ever find a work around for it ?
Posted on 2003-01-15 04:01:23 by Uradox
I used CreateFile and ReadFile, there is not much more to do using this basic functions
instead of mappings.
Posted on 2003-01-15 06:30:58 by beaster