when i read a file sized more than 700 MB i got a message "out of memory"

can anybody tell me how i can read file sized more 700MB and store in a byte array
bco'z actually i want to read a file and store in byte array for further encryption algo that
accepts byte array


plz help
Posted on 2006-05-30 06:56:28 by sihotaamarpal
sihotaamarpal: there's a limit to how much memory you can allocate under windows. The HEAP functions (and thus Global/LocalAlloc, which internally use HeapAlloc) have some limit, you can get more if you switch to VirtualAlloc instead (which is a good idea if you're working with large buffers that you don't need to resize). You're still limited to a couple of gigabytes at max, though, because of the way memory is split between usermode/kernel. So you probably won't be able to alloc more than 1-1.5GB continuous memory.

This is for NT, by the way - things are worse on 9x.

Really, you ought to do your processing in chunks. It's better for the system as a whole...
Posted on 2006-05-30 07:16:59 by f0dder
And your program will lockup while the file is being loaded
unless you thread that section of your code, maybe you
should use filemaping?

Zcoder....
Posted on 2006-05-30 11:16:18 by Zcoder
Filemapping is often inefficient, though.

*) you have no control of file caching, no matter if the underlying CreateFile handle was specified with no buffering.

*) you still have to process large files in chunks (MapViewOfFile) - this is especially true on 9x where memory mapped files are created in the global shared space.

*) you will generally get a pagefault for each 4k you access, because that's how memory mapping works. The context switches involved in this are expensive, and the effect can clearly be seen if you monitor CPU usage while processing files.

Filemapping is great because it's really easy to use, though, especially if you move to 64bit architectures (don't need "chunked" file processing), and there's probably some situations where they can have good performance.

But for sequential processing of files, filemapping slows stuff down.
Posted on 2006-05-30 12:11:31 by f0dder

*) you still have to process large files in chunks (MapViewOfFile) - this is especially true on 9x where memory mapped files are created in the global shared space.

This is done by (D)OS, not by userland programmers, at least on Win98.  I once mapped entire 500MB file at once and processed it successfully, and need not do it by pieces.
Posted on 2006-05-30 22:58:38 by Starless
Try running four instances that each maps a different 500meg file... b00m. I can't remember the size of the shared region on 9x, but it's probably around a gigabyte or a bit more. DLLs and mapped files are allocated here and all "compete for space".

On NT, mapped files are allocated from process-private address space, and thus you don't have as grave problems (but you're still limited to mapping somewhat less than 2GB at a time).
Posted on 2006-05-31 06:13:38 by f0dder
I was a betatester of one asm-coded commerical app, that used MapViewOfFile, it sometimes couldn't map whole files larger than 400MB, and often 700+MB . Instead, process the file in chunks. (on Win2k SP4)

You should either encrypt the chunks independently, or have your encryption algo return an encrytion key to use for the next chunk. Of course, the latter is much more secure (harder to break), and isn't hard to make usually :). The plus is that you get extra speed, once you have a good chunk-size. .

Combine with Overlapped file reading, and you could get the best speed of reading the file (process the current chunk while the next one is being loaded into RAM).
Posted on 2006-05-31 09:05:54 by Ultrano
Yeah, if you're doing linear processing of massive amounts of data, nothing beats Overlapped I/O with FILE_FLAG_NO_BUFFERING and plain old ReadFile.

And do remember to use some CBC or other chained mode for encryption, as Ultrano said.
Posted on 2006-05-31 12:20:37 by f0dder

You should either encrypt the chunks independently, or have your encryption algo return an encrytion key to use for the next chunk. Of course, the latter is much more secure (harder to break), and isn't hard to make usually :). The plus is that you get extra speed, once you have a good chunk-size. .


Yup, when I set up an encryption scheme for some of my files I used a hash of the password for my 64 bit encryption key then used the unencrypted 64 bits as the key for the next 64 bits and stepped through the file. This method is slow but it helps to stop pattern searches in order to determine the original key value, something that is necessary when dealing with mainly text files. I originally set it up for RC6 but now I'm using TEA.
Posted on 2006-06-01 06:49:32 by donkey
I guess I should note that I prefer RC6 over TEA but I was unable to reach an agreement with RSA for a license to use the RC6 algorithm.
Posted on 2006-06-01 06:56:37 by donkey
Hm, why TEA and not something like AES? Hopefully XTEA if you insist on TEA. And too bad you couldn't get a RC6 license, it's a pretty sweet cipher.


I used a hash of the password for my 64 bit encryption key

If at all possible, you should use hash(password, randomvalue) for the key. This is called "adding salt", and means that two identical files using the same passphrase will not result in the same ciphertext. It requires some (fast) way to identify if the correct key has been found, and changes the file-open to basically a bruteforcer (so you want to limit your randomvalue a bit as well). But it's good.


then used the unencrypted 64 bits as the key for the next 64 bits and stepped through the file.

The common way called CBC (Chaining Block Cipher) involves using a so-called "Initialization Vector". This IV is one block large, filled with random data, and safe to store in your encrypted file.

For each block, things go like this:
Encryption:  output = encrypt(input XOR InitVector, key),
Decryption:  output = decrypt(input, key) XOR InitVector.

and then InitVector = output. Of course, things can be done in-place so you don't need to swap stuff round as much. This is off top of my head, you might want to double-check to be sure.
Posted on 2006-06-01 11:22:26 by f0dder