I've been playing around with arrays in MASM, and I have noticed that MASM behaves really strange when I try to create arrays with a size of a few hundred kb or so.

I use the following declaration:

big_array db 500000 dup (?)

Now, the compiler takes forever to compile the program as soon as such a big array is in it!?! Not to mention that the compilation time seems to increase quadratically to the size ot the array. It takes about 30 seconds to compile if I have a 500 kb array (on my 500 MHz Pentium III), and above that it's practically unbearable due to the apparent quadratic time-complexity of the operation.

Does anyone know why this happens? It there any way around it, or is it just impossible to create such "large" arrays in MASM programs?!? :confused: :(

Any tips would be greatly appreciated.

Posted on 2003-01-12 18:42:16 by dELTA
yes it does that unfortunately :/

allocate the memory yourself via heapalloc or local/global alloc and then just fill it in with your data structures.
Posted on 2003-01-12 19:02:21 by Hiroshimator
Ok, I found some info about it in this thread (which in turn linked to this thread). Sorry about not finding it before posting. :o

Apparently there is no way around having to wait for this long time at least once (after that, you can link to the object file containing the "compiled" array), which is a little sad. :(

Anyway, just to avoid getting too flamed about my question:
I am very well aware of dynamic memory allocation and all that, but I'm just playing around with testing the limits of MASM a bit.
Posted on 2003-01-12 19:06:51 by dELTA
Thanks for your reply too Hiroshimator (it was not there yet when I posted my follow-up post), it's always nice to have things confirmed by the wizards around here. ;)
Posted on 2003-01-12 19:09:07 by dELTA
I don't understand either why it does what it does :|

seems weird that they always overlooked this obviously unwanted behaviour.
Posted on 2003-01-12 19:09:13 by Hiroshimator
Convenience, I suppose.

You would get this by using the same code for initialized and uninitialized data, simply suppressing data generation in the uninitialized case. Whoever wrote that code probably felt that malloc() style memory allocation was the best way for dealing with large blocks of memory.

There might also be a disk thrashing issue because of this code sharing.

Here's another suggestion to try: use a REPEAT macro.

IRP number_of_entries ; REPEAT macro?
DB size_of_entry DUP(?)
ENDM ; or is it MEND?
Posted on 2003-01-14 14:55:00 by tenkey
tenkey, I thought about that too, but when I thought again about that it would generate a 1 million line source listing (for a 1 MB array) I didn't even dare trying it. That just can't have a good ending (not to mention be fast) I figured. ;)

I tried to split up the big array into adjacent smaller arrays, hoping that the quadratic time complexity was local to each array, but that was not the case, it took the exact same time. :( And this was also a reason for me to believe that the repeat macro method wouldn't work (since it does effectively only split up the big array into many "1-byte arrays").
Posted on 2003-01-14 18:59:14 by dELTA

As a matter of fact, the assembler cannot handle large arrays in the uninitialised data section without very long delays or locking up so you have actually found a BUG at last. What it forces you to do is code it the right way and dynamically allocate the memory you need at runtime.


Posted on 2003-01-15 02:15:23 by hutch--
Yes, dynamic memory allocation will be the right way to go in most cases, and many newbie programmers overuse static memory structures, but there are certainly cases where dynamic memory allocation it is not applicable/suitable.

E.g. if I want to embed a large array of pre-calculated data into my exe. Sure, there are always ways around this kind of situation too (like putting the data into an external file or a resource block in the exe), but that would both increase the complexity (with file I/O or resource loading code), the total size, and the loading and execution time of the program, compared to simply directly accessing an array inside the program. I for one wouldn't categorize a design that degrades a program in all these aspects as "coding it correctly", but maybe that's just me.
Posted on 2003-01-15 06:06:36 by dELTA
including binary data: use my bin2o, or a bin2inc and "assemble once, link many".
Posted on 2003-01-15 06:20:59 by f0dder

depending on just how large the data you want to initialise is, work out how long it wold take at load time to dynamically load the data you want to include.

Depending on the size, if its only a meg or so, the load time will be no big deal and youn load what you want into it.

It means a bit more code but it has nearly unlimited capacity where trying to load it at assembly time will keep finding the data size limit you have experienced.

If its data that does not need assembly time calculation, give f0dders tool a try, it seemed to work OK.


Posted on 2003-01-15 06:44:18 by hutch--
Thanks for the tip f0dder!

One question though:
Will I be able to place this array at any exact point in the code section of my program? I'm experimenting with building a minimal executable packer, and hence I need all my code and data to be in one single section, and not to be split up in code and data segments having the MASM linker deciding where to put them. :(

I could not figure out how to accomplish this by using the bin2o tool, but if there is a way to do it, it would be really great if you could tell me how!

Posted on 2003-01-15 07:23:14 by dELTA
sorry, the bin2o tool is not flexible enough for that (though you might have some success
by hacking the .obj file with a hexeditor). nasm has "incbin" which is useful, but that's not
going to be of very much help to you when you're using masm :). for stuff like stubs for
exe packers, I tend to find fasm or nasm more useful than masm btw... it's easier to get
them to do what you want (though manual segment definitions for masm gets you a long way).
Posted on 2003-01-15 07:33:24 by f0dder
Ok, thanks for the info. I haven't dared look at any other assembler yet, since I'm still quite new to MASM, but maybe in a while then. :)

Is NASM or FASM much unlike MASM btw? And do they have a macro language relatively equal in power to the one in MASM (or maybe even better)?
Posted on 2003-01-15 09:30:55 by dELTA
I have used MASM, TASM and NASM, and I personally like NASM the most, as it allows more control over things, and I think its macro- and preprocessor is even better than MASMs. It also allows you to do things like in C language (single line macros and so on).

Only negative side I could think of would be, that it doesn't include internal support for some things like STRUC or invoke, but that's not a big issue, at least for me, because the macroprocessor is so powerful, you can implement those things in macros, and customize almost everything.

Also NASM supports wide range of output formats (OMF,Microsoft Win32 obj,aout,elf,rdoff, and also plain binary).

And third.. It's open source.

I nowadays use it for everything, also use it to develop my own operating system (along with other tools).

Ok, now this is starting to look like a marketing speech:grin:
So, if you want to get it and form your own opinion 'bout it, here's the link: http://nasm.sourceforge.net/

P.S. I tested NASMs ability to allocate big 'arrays', and it works without delays, even for things like >50 MB.
Posted on 2003-01-15 11:09:08 by Stealth
Ok, thanks for the info Stealth! I will take a look at NASM when I get some time, and after I've gotten a bit more used to MASM first.
Posted on 2003-01-15 18:22:44 by dELTA

If you need to accurately specify the location of data and do not mind embedding it into the .CODE section, store it in DB format and either jump over it or place it at either the beginning of the code before the "start" label or at the end after the ExitProcess API call.

f0dder's tool produces library format data which is linked near the end of the file like normal in MASM PE files. There is a tool in MASM32 for converting any file to DB format but it is limited to about a 1 meg binary file.


Posted on 2003-01-15 23:50:44 by hutch--
object format, not library format - but that's just me being pedantic :).
The location of the embedded data in the final executable will depend
on the object link order giving on the link.exe commandline - in typical
smal "one asm file, one data file" projects, this indeed would place it
at the end of the .data section.
Posted on 2003-01-16 01:42:20 by f0dder

Yes, that is exactly what I want, but won't this result in the exact compilation speed problems I had to begin with? Or do those only apply to dup:ed arrays?

With "DB format", I assume you mean like this? :

large_array db 01h, 02h, 03h, 04h, 05h, 06h, 07h, 08h, 09h, 0Ah
db 0Bh, 0Ch, 0Dh, 0Eh, ...
Posted on 2003-01-16 05:30:29 by dELTA
you will get speed problems with initialized as well as DUPed arrays.
I would imagine initialized data is slower, as there's more disk I/O
and more parsing to be done. You can always just assemble the initialized
data once, though (use makefiles or an IDE instead of batch files...)
Posted on 2003-01-16 05:41:04 by f0dder