I'm used to programming in VB using the syntax below for arrays and would like to know how to do the same in assembler, such as how to declare an array variable, how to write data to the array, and how to read from it, thanks:

VB Code Logic Example:

'Put 3 text items into string array.
strArray(0) = "Item 1"
strArray(1) = "Item 2"
strArray(2) = "Item 3"

'Put 3 number items into number array.
intArray(1) = 1
intArray(2) = 2
intArray(3) = 3

'Read string from string array.
msgbox strArray(1),vbOKOnly,"String Read"

'Read number from number array.
msgbox intArray(1),vbOKOnly,"Number Read"
Posted on 2003-06-03 01:27:06 by Knight Chat X
Well, if you consider how VB probably does it the answer is clear. Store your strings in a memory buffer and save the offset of the beginning of the string. The offset is saved to an array of DWORDs. For example :

>allocate enough memory for your strings and get the base pointer
invoke HeapCreate,HEAP_NO_SERIALIZE,4096,NULL
mov hHeap,eax
test eax,eax
jnz @F
invoke ExitProcess,eax
@@:
invoke HeapAlloc,hHeap,HEAP_ZERO_MEMORY,4096
mov pHeap,eax
mov baseptr,pHeap

; baseptr is used to keep track of where the end of the array is.

NumArray would be like a DIM statement (in this case 256 elements max):

NumArray dd 256 DUP (0)

Adding strings
; copy a string into the array
invoke lstrcpy,baseptr,ADDR String
; Save the base pointer into the current var
mov edi,OFFSET NumArray
mov ,baseptr
;Set the new base ptr
invoke lstrlen,ADDR String
add baseptr,eax
inc baseptr

Reading strings
mov edi,pHeap
invoke MessageBox,NULL,,NULL,MB_OK

There are garbage collection problems associated with this method however so it is best used with arrays that will not change alot.
Posted on 2003-06-03 01:49:08 by donkey
Number arrays are indexed as follows :

Just create an array of DWORDS using DUP

Array DWORD 1024 DUP (0)

then use this addressing mode to assign and read values :

mov edi,OFFSET Array
mov ,1
mov ,1
mov ,1

The second parameter (0,1,2...) is the element number that you wish to access, the 4 is for 4 bytes (DWORD)

I think you're allowed to do this in MASM as well:

mov Array[1],1
mov Array[2],1
mov Array[3],1
Posted on 2003-06-03 01:52:23 by donkey


.data
string1 db "item1",0
string2 db "item2",0
string3 db "item3",0
string4 db "item4",0
stringtable dd offset string1, offset string2, offset string3, offset string4

.code
lea edi, stringtable
mov eax, [edi] ;first string
mov eax, [edi][1*4];second string
mov eax, [edi][2*4];third string
mov eax, [edi][3*4];fourth string




.data?
array dd 4 dup (?)

.code
mov array, eax
mov [array+4],eax
mov [array+8],eax
mov [array+12],eax

Should work me thinks. :rolleyes:
Posted on 2003-06-03 03:12:19 by roticv
I like the string method roticv but it is only useful if you have fixed compile-time strings. With run-time strings you would be required to set the maximum size possible for each entry, an incredible waste of space. Consider if you were storing paths, MAX_PATH is 260 characters, for the most part your array would contain NULLs. Also declaring a large array in the data section makes for really big executables. The garbage collect problem is solved as there is no expansion of the array possible and therefore no contraction either.

The number array is essentially just a syntax change, in MASM I would prefer the Array syntax, it seems clearer and would lend itself more easily to debugging later.
Posted on 2003-06-03 05:07:52 by donkey
Actually can store the array on the stack.


sub esp, 4*4
mov esi,esp
mov [esi],eax
mov [esi+4],eax
mov [esi+8],eax
mov [esi+12],eax
....
add esp, 4*4


Well creating a string table have its uses. It could do some sort of look up if you know the index of the array of the string.
Posted on 2003-06-03 07:16:27 by roticv
Ok, both of those method's seem to have a good use, and since I mostly program with the use of array's for things such as hash/ID/comparison tables, without being able to work with array's in MASM it would be a pain, so thank you both.

There are a few more questions I have on these method's, but will post them a bit later.

For now, here's one, why does the multiplier start at 4 bytes and is doubled everytime for each incrementation of the index?

Like below:
*4
*8
*12
Posted on 2003-06-03 11:24:06 by Knight Chat X
That is because data is stored as dword and each dword = 4 bytes. :rolleyes:
Posted on 2003-06-03 11:26:19 by roticv
I've been thinking about a good garbage collect routine and though the routine is easy enough, you just allocate another heap and copy all valid strings to it contiguously updating the pointers as you go, it is difficult to come up with an algorithm that will tell you the level of granularity or fragmentation in your array. You essentially have 2 choices, conserve memory by collecting often or collecting less often and have higher granularity. At a certain point if you wait too long the collection time could become prohibitive.

I think that you could custom tailor one to your app depending on how many deleted strings you had, say after 50 deleted array elements you collect the garbage. It would simple be a matter of copying the strings then destroying the old heap. You may also want to set your heap so it can grow so that the array is expandable.
Posted on 2003-06-03 12:35:18 by donkey
Ok, that's great, already have made functions for doing all of that in VB, sometimes when you have program that requires scanning a file with a variable number of lines, such as for an options file, you need to be able to adjust the size of the array as the number of lines in the file can vary, in that case you just ReDim to change the size.

If I were to wanna cut down on the empty spaces in an array if spaces existed, and to get an accurate count of the lines in the file minus the spaces, I simply use a special CopyArrayToArray function that has a flag option of ignoring the empty array index items so when the array is copied into a new array it does not include the empty spaces. There is much more to it than that, such as doing a comparison on the first array minus the empty index items in order to properly set the new array size to match the number of items that will be placed into the new array, but all of that is pretty easy.

In MASM, however, I still see a difference in the array pattern, for one, what exactly causes this defragmentation in an array, are you talking about the empty index items whenever an item in the array is deleted?

If so, when that happens, can new data be placed in that empty index item space after it is deleted, or would this cause further fragmentation and that's the reason for dumping the allocated memory and redoing it like you mentioned by allocating new space and basically rebuilding the array again?

Do all the item sizes in the array have to be the same size, etc., or can it vary?

And what do you mean rebuild the array within a certain amount of time if the array items are changed alot?
Posted on 2003-06-03 23:05:17 by Knight Chat X
The index isn't the problem, that has a fixed length of 4 bytes. The problem is that you have variable length data so each time you allocate a string to replace an existing one it must be put at the end of the array and it's index pointer changed. This does not delete the original string, that remains in memory taking up space, when you collect the garbage you are removing the dead strings from the array. For example :

> copy a string to the buffer
mov Index[12], pString

; Replace string 12

> copy a new string to the end of the buffer
mov Index[12],pNewString

The original string is not deleted, To delete it you must allocate a new buffer and copy all valid strings to it in order to remove the dead ones. For example if you had strings 0-100 it would go something like this, I haven't checked it but it should be close. You can save alot more time by using DWORDs to transfer the strings but that is a little more complicated, see the StrLen source for MASM32 to find out how.

mov ecx,100
mov edi,pnewbuffer
mov esi,Array
@@:
mov al,
mov ,al
inc edi
inc esi
cmp al,0
jne @B
dec ecx
mov esi,Array
jnz @B

This is something that every language does in the background, however ASM requires that you do it yourself. Whenever you use dynamic data arrays you must do a garbage collect to conserve memory. You may notice that when you move alot of data in an array using VB, the program can stall sometimes, this is the GC cycle.
Posted on 2003-06-03 23:32:04 by donkey
Do all the item sizes in the array have to be the same size, etc., or can it vary?

Regarding this topic, you have to ask yourself one question. Why do you want the item size in the array to vary? Yes, it could be done (You could do almost anything with asm). However when using item size in the array varies, it is hard to access the data that you wish to access.
If I were to wanna cut down on the empty spaces in an array if spaces existed, and to get an accurate count of the lines in the file minus the spaces, I simply use a special CopyArrayToArray function that has a flag option of ignoring the empty array index items so when the array is copied into a new array it does not include the empty spaces. There is much more to it than that, such as doing a comparison on the first array minus the empty index items in order to properly set the new array size to match the number of items that will be placed into the new array, but all of that is pretty easy.


Why not code the algorithm yourself? Anyway since you are coding in asm, you should approach programming from a new point of view and not be 'stuck' in vb's methodology. Just remember that it is easier to create bugs in asm, but harder to find them as compared with HLL. Nothing is limiting you when you code in asm.
Ok, that's great, already have made functions for doing all of that in VB, sometimes when you have program that requires scanning a file with a variable number of lines, such as for an options file, you need to be able to adjust the size of the array as the number of lines in the file can vary, in that case you just ReDim to change the size.

View the file mapped into memory or copied to an array as an array of bytes. With the pointer to the memory or array, alot of things could be done. To limit the number of lines, just scan for CRLF and count the number of CRLF. When the number of lines you need is reached, just do what ever you want to do with it.
Posted on 2003-06-04 04:01:21 by roticv
roticv,

Regarding this topic, you have to ask yourself one question. Why do you want the item size in the array to vary? Yes, it could be done (You could do almost anything with asm). However when using item size in the array varies, it is hard to access the data that you wish to access.


Must be a failure to communicate, I asked this:
Do all the item sizes in the array have to be the same size, etc., or can it vary?


The question was asked because I needed to know why in the examples the size of each index item/layer, appeared as if it had to be the same size, that's all. Weither it is harder or not to access the data doesn't matter, as you said:
Yes, it could be done (You could do almost anything with asm)


Why not code the algorithm yourself? Anyway since you are coding in asm, you should approach programming from a new point of view and not be 'stuck' in vb's methodology. Just remember that it is easier to create bugs in asm, but harder to find them as compared with HLL. Nothing is limiting you when you code in asm.


First of all, I use this messageboard as a last resort, only after there's a big stop point in development, or when there's an idea and I might be able to possibly help someone, you can look at the number of posts I've done and see it's not the largest, I already have created the algorithm in VB and in other programming languages, that doesn't matter, yea this is MASM, and nobody's perfect. Regardless, every programmer programs differently, I started on an old government 286 dell computer with a orange monochrome monitor when I was 12, and have since then tried over 14 languages, mostly stayed away from the common languages, weither it's batch file programming, GWBASIC, QBASIC, VB, HTML, DOS Console, MASM, NASM, TASM, Borland, Perl, CGI or any other language, I program on inspiration and with reason, I'm not stuck at all on VB, hopefully that wont be assumed as I only use VB in examples to break things down so the communication gap in my posts isn't so big, like any mechanic that chooses his brand of tools, or artist that chooses paint colors, I've chosen mine. I even found problem in the Windows.inc file once long ago and submitted my findings to SH. But guess that doesn't matter to much now huh. No matter what views I had back then I still found problem and saw the solution. The bottom line is, I don't tell you how you should program or do things, so please respect me equally as a fellow programmer, thank you for your comments.

If you do not ask, you do not learn...

Here's a good bit of reading for those that have MASM installed, it's a nice artical:
C:\masm32\HTML\warriors.htm
Posted on 2003-06-04 07:09:54 by Knight Chat X
The question was asked because I needed to know why in the examples the size of each index item/layer, appeared as if it had to be the same size, that's all. Weither it is harder or not to access the data doesn't matter, as you said:

Well because if the arrays are of different size, people use structs instead.


mystruct struct
dword1 dd ?
byte1 db ?
word1 dw ?
dword2 dd ?
dword3 dd ?
mystruct ends

.data?
mystruct1 mystruct <1,1,1,1,1>

.code
lea edi,mystruct1
assume edi: ptr mystruct
mov [edi].dword1, eax
mov [edi].dword3,eax
assume edi:none
Posted on 2003-06-04 07:23:20 by roticv
k Donkey,

.data
Array DWORD 1024 DUP (0)

.code
mov edi,OFFSET Array
mov ,1
mov ,1
mov ,1


While trying to compile is giving a "Invalid Instruction Operands" error for each of the:
mov ,1
Posted on 2003-06-04 07:34:43 by Knight Chat X
.data

Array DWORD 1024 DUP (0)

.code
mov edi,OFFSET Array
mov dword ptr [edi+0*4],1
mov dword ptr[edi+1*4],1
mov dword ptr[edi+2*4],1

This is because the size of data to copy to was not specified. The fact that copying an immediate to memory does not tell the assembler what is the size of data to copy to. however if copying memory to reg or reg to memory, the assembler knows the size of data to copy and write to.
Posted on 2003-06-04 07:37:50 by roticv
Using a structure, hmmm, well that's pretty cool, lol
Posted on 2003-06-04 07:40:19 by Knight Chat X
Donkey, couldn't get this to work, sometimes it'll compile, but crash, other times when changed a bit I'll recieve a baseptr error saying invalid operand etc, or a problem with pHeap saying the samething.

invoke HeapCreate,HEAP_NO_SERIALIZE,4096,NULL
mov hHeap,eax
test eax,eax
jnz @F
invoke ExitProcess,eax
@@:
invoke HeapAlloc,hHeap,HEAP_ZERO_MEMORY,4096
mov pHeap,eax
mov baseptr,pHeap ;<<<<PROBLEM 1

; baseptr is used to keep track of where the end of the array is.

NumArray would be like a DIM statement (in this case 256 elements max):

NumArray dd 256 DUP (0)

Adding strings
; copy a string into the array
invoke lstrcpy,baseptr,ADDR String
; Save the base pointer into the current var
mov edi,OFFSET NumArray
mov ,baseptr ;<<<<<PROBLEM 2
;Set the new base ptr
invoke lstrlen,ADDR String ;<<<<<<PROBLEM 3
add baseptr,eax
inc baseptr

Reading strings
mov edi,pHeap ;<<<<<<PROBLEM 4
invoke MessageBox,NULL,,NULL,MB
_OK
Posted on 2003-06-05 06:27:00 by Knight Chat X
It is not donkey's error. Look through the intel manual for the opcode and you would realise that there is something called the mod. However there is no option for any opcode to read from memory and store it back on memory. Thus the assembler generate an error. Replace it with


mov baseptr,eax

or


push pHeap
pop baseptr

mov edi,OFFSET NumArray

mov [edi+ArrayElementNumber*4],baseptr ;<<<<<PROBLEM 2

Read my above statement. No opcode can work on 2 memory (for the source and destination, whatever you call it). Unless of course ArrayElementNumber is a constant/immediate.
Use something like
lea edi, NumArray

mov ecx,ArrayElementNumber
mov [ecx*4+edi],baseptr

Problem 4 lies on the same fact as Problem 2. Furthermore the MessageBox api expects its parameter for lpaption to be a pointer to a string and not just data.
invoke MessageBox,NULL,[edi+ArrayElementNumber*4],NULL,MB

Not sure about Problem 3.
Posted on 2003-06-05 06:37:28 by roticv