Hello, i've just started with learning ASM at school, which is fun and all, only the down side is, is that we use a simulator for our ASM code and we dont use it on a real processor like Intel (80x86).
The simulator is called "Zep2 Proccesor simulator", i dont think anyone here ever heard about it. I think its only used here, in the netherlands, thats where im from.

But we are also allowed to use ASM for the 80x86 processor, so i've choosed to do that, but all i need to do now is to port the code that i have for the Zep2 simulator to MASM code. And thats where i need a little help.

I'll show you a piece of code that the Zep2 uses and ill try to explain what it does, R1 and R2 are  the only registers that are availible.



##

begin load R2, -2;  //laad R2 met -2 -> Load R2 with -2

      load R1,R2;  //inhoud R2 naar R1 -> content of R2 to R1

      inc R1;      //tel 1 op bij inhoud R1 -> add +1 to R1

      store R1,0xf; //bewaar inhoud R1 op adres 0xf -> Save content of R1 to address 0xf

      nop;          //doe niets (no operation) -> no operation

      halt;        //end of program..?

##


Dutch comment on the left (you can ignore that), english comment on the right.
Now this is a program we had to write, nothing special as you can see.
But i wanted to change this code so it can be assembled with a MASM compiler.
Im not sure but i think the 'load' command here is the equivalent of 'mov' in 'real' ASM.

So my question is, how would this piece of code look like if it was written with MASM...???
Posted on 2006-09-21 13:35:28 by vivendi
Here is a direct translation of instructions... but please note that program logic will not follow while using an actual OS like Windows.


;EAX replaces R1
;EBX replaces R2

; begin load R2, -2;  //laad R2 met -2 -> Load R2 with -2
mov ebx, -2

; load R1,R2;  //inhoud R2 naar R1 -> content of R2 to R1
mov eax,ebx

; inc R1;      //tel 1 op bij inhoud R1 -> add +1 to R1
inc eax

; store R1,0xf; //bewaar inhoud R1 op adres 0xf -> Save content of R1 to address 0xf
mov dword addr [0xf],eax

; nop;          //doe niets (no operation) -> no operation
nop

;halt;        //end of program..?
hlt


If you were to copy this flat binary output to a floppy disk (make it bootable) and boot it, that is about as close as you could get with that processor simulator.

Since you want to use MASM, I will have to assume you wish to program under DOS/Win32. In that case, accessing memory address of "0xf" won't mean much of anything... and might actually cause the program to crash.

As you can see, "mov" covers the loading of immediate values and data by referencing memory address values. So in this case, "mov" replaces "load" and "store", but the MASM does the work in translating those instructions to the needed opcodes.

Finally, "hlt" is not guaranteed to work and not recommended. I would suggest simply returning ("ret") from the program to end it (Win32 API or DOS COM File).

PS: Please clarify that the "load" and "store" instructions operate as you indicated, because "load" (load destination,source) works opposite of "store" (store source,destination) if that is the case... which is unusual and might create confusion when working with Intel-based ASM syntax.
Posted on 2006-09-21 15:15:02 by SpooK
Thanks alot for the useful explanation and code :)
And indeed the line where it has to place the value in a memory address fails.
I gives me this error:


error A2008: syntax error : addr


But besides that (like you already said) i dont think 0xf is a real memory address in Windows, is it?
What would be a real memory address that i could place data in..??
Posted on 2006-09-21 16:23:51 by vivendi
My MASM is a little rusty. It should be along the lines of "mov dword ptr[0x0f]". So that is "dword ptr" and not "dword addr".

0x0F is a real address in that it exists. With Paging enabled, 0x0F (0x0000000F since we are talking about 32-bit addressing) could be a virtual address that points to any part of physical RAM (or not pointed at all, which would cause a Page Fault). Read up on the paging mechanism to understand how this works.

When you have pre-established APIs and modern-day assemblers, like DOS INTs or the Win32 API provided by the underlying OS along with MASM, you rarely address memory directly (unless you know *exactly* what you are doing). Memory is usually addressed through the use of labels and pointers.

Example of addressing a label...


.data
my_variable dd 0 ;A DWORD variable in memory

.code
mov dword ptr, eax ;Store value of EAX to address indicated by my_variable label
;IN MASM, the above instruction can also be like this...
mov my_variable, eax


Labels are calculated at the time of assembly, and thus all labels are replaced with real/virtual addresses that the program is "expecting". MASM is C-like in that when you address a label, it treats it as if you are accessing the value held at the address of that label, and not the label itself... which is why that second instruction example also works (MASM masks the extra work for you).

There are also instances where you allocate a chunk of memory and save the pointer to that chunk.

Example of addressing a pointer...


.data
my_pointer DD 0 ;DWORD Sized Pointer

.code
invoke _malloc, 4096 ;Allocate 4KB, result will be in EAX
mov my_pointer, eax ;Store the address pointer of the allocated memory
mov dword ptr, 'test' ;Store the 4 byte ASCII string "test" into the first 4 bytes of the newly allocated memory

;do some other stuff with EAX...
mov eax,1234

;oh no, we lost our pointer... let's get it back!!!
mov eax, my_pointer ;Load the address pointer back into EAX
mov dword ptr, 'done' ;Store the 4 byte ASCII string "done" into the next 4 bytes


_malloc is a standard C call that attempts to allocate a chunk of memory (of desired size) and returns the address if successful, otherwise it returns "zero" (NULL) indicating an error. After we store the pointer value to the address of "my_pointer", we use the same pointer that is still in EAX to address the memory chunk and store the string "test" to it. This is the other way to address memory on the x86, direct address number or by using a register that has the address. The above example also shows why you would want to "save" pointers and other run-time generated data.

As you can see, a Label is simply defined in the source and converted at assembly-time... and a pointer is usually an uknown value and gathered at run-time.

I know this isn't the "best" explanation, but I hope it is one you will understand and connect with :)
Posted on 2006-09-21 19:06:40 by SpooK
Thanks alot for the explanation! So if i get this right, its better to now assign a value directly to an address, its better to use labels/pointers for this, right?
I tried that, and the file assembles fine, except for one warning that i get.

This is the code:

.386
.model small
.stack
.data
my_var dd 0

.code

main proc

mov eax, -2
mov ebx, eax
inc eax
mov dword ptr, eax
nop

main endp
end

BTW: my_var is set to '0', does this mean it has the value '0', like it could have been '34', or does it mean that the variable is defined but doesn't have a value yet?

And this is the Warning:

LINK : warning L4038: program has no starting address


Posted on 2006-09-22 00:23:13 by vivendi
That is correct, do not specify memory addresses as hard-coded numbers (i.e. rely on labels instead) unless you are certain of what you are doing and why (i.e. Operating System Development is riddled with dealing in certain static PC memory locations). Reason being is that labels are resolved at assembly/link time to help prevent error, amongst other advantages that you will soon realize when you start developing larger and more complex programs.

On to the program. Generic program basics 101. You have 3 basic sections of a program. Code, Initialized Data and Uninitialized Data.

Code (.code) is the obvious one, where your program logic is kept in its machine/byte code format.

Initialized Data (.data) is set to whatever you specify at assembly/compile time and is included in the program executable file. This is what guarantees that "my_var" is indeed set to an initial value of "0" in your example.

Uninitialized Data (.bss) is *not* held within the program executable file, but the program executable will specify how much memory should be reserved (like using malloc) and that amount will be reserved at load time. Any worthy OS/loader will make sure this Uninitialized Data section is zeroed-out by default, but play it safe and don't depend on initial values of any Uninitialized Data.

As for your linker warning (seems like you are targeting DOS), you have to specify what version of MASM/LINK you are using, what type of binary you are trying to produce from the linker (COM/EXE/etc...) and what OS you are trying to target (DOS/WinXP/etc...) before I can give you any advisement. It would also help if you posted what command-line options you are giving to MASM and LINK.

There is sort of a learning curve in dealing with x86 ASM programming principles, but it is easy to adapt to once you understand the architecture. To help you further, you should read the x86 Book or some of Iczelion's Tutorials (I think there is even a few in Dutch) in order to familiarize yourself with programming basics.

PS: Also, please tell us your current programming experience/level so we can assist you further.
Posted on 2006-09-22 01:49:11 by SpooK
Thanks alot for your explanation. i really appreciate that!
About the info you want, im using Windows XP, the linker is making .exe files for me. When i open the exe file a DOS window pops up, so i guess im targetting DOS.
About the linker that im using, i haven't downloaded MASM itself, but i have a tool which is the MASM linker (or something like that).
You can still get it from http://www.coderz.nl/_downloads/ASM.zip if you want to check it out.
Im assembling my code with this command:

ml.exe mycode.asm

thats all. It creates an .obj file and an .exe file.
I've seen Iczelions tutorial, but thats targeted to real Win32 programming, i would rather stay with the basics so i can play around with values in memory, kinda like im doing now. So yeah, i guess i want to stay with DOS programming for now.

I've been programming in C/C++ for some time now, i've learned basic console programming and slightly advanced Win32 apps. Also know some things about winsock.
I also know J2EE, but just the basics.

That's basically it, i hope i haven't forgot to mention something...
Posted on 2006-09-22 03:55:54 by vivendi
The short and easy answer
You just need to change "end" to "end main" at the end of "my_code.asm" for it to resolve the starting address. This helps classify main as not only a procedure, but the program starting point as well.

The long and tough answer
Playing around with DOS is going to *force* you to conform to 16-bit programming techniques such as Segmentation. Even more so, is the fact that you will need to use DOS Interrupts for program control. Between those two, you are probably going to cause more work for yourself than anything else. I would highly suggest sticking to Win32 console programming as it offers the flat address space and all the other wonders of 32-bit Protected Mode programming.

For example, here is the first quirk about DOS programming... how to exit a program...


mov ax,0x4C00 ;"Exit Program"
int 0x21 ;Execute DOS Interrupt


In Win32 Console programming, it could be as simple as "ret" or even "ExitProcess". In DOS, you will have to rely on obscure interrupt numbers and references.

As for your source code example... "mov dword ptr, eax" would no longer apply". You will have to load a Segment Selector/Register as the segment reference in the standard segment:offset address format. Default operation is to use BX as the offset for all addressing. Strings in DOS are terminated with '$' and not '\0'. You rely on 16-bit Registers (ax/bx/cx/dx/etc...) and not the 32-bit ones (eax/ebx/ecx/edx/etc...)

The list goes on (which is partially described by THIS LINK)... and on top of that, you cannot depend on Windows DOS (actually "command") boxes... an emulator like DOSBox would be more appropriate.

In the end, you will probably be learning more quirks about an obsolete operating system and depreciated operating mode (16-bit Real Mode) than you will actual assembly language programming. If you insist though, I would suggest heading over to Randy Hyde's place and picking up a copy of AoA for DOS/16-bit Edition. If you plan on switching to Win32 programming, head over to the same place and pick up Windows/32-bit Edition of AoA... just beware that the 32-bit Edition centers around Randy Hyde's HLA and not MASM. ;)
Posted on 2006-09-22 12:58:53 by SpooK
wow, thanks for clearing that up. I think im going for the win32 asm then.
So im gonna stick to win32 console programming instead :)

The book i have is only about 16-bit and you said with the link you gave, that its about HLA and not MASM, i dont know how much that differs from eachother..?
Im gonna start reading it, the beginning looks familair, but if you know more resources on the net about win32 console ASM, or maybe a good book about this, then please let me know.
Posted on 2006-09-23 05:02:14 by vivendi
Console programming using the Win32 API only differs from window programs in that you probably won't use any GUI functions. Practically everything you need that is external to your program will be an API call to one of the many libraries, utilizing the STDCALL convention.

Your biggest friend will probably be the MSDN website, they have a list of all Win32 API functions, what they do and how to use them.

Since you have programmed in C, you should probably look up MSVCRT functions... those will be your Standard C Library equivalents in Windows. MSVCRT still uses the CCALL convention, as opposed to most other Win32 API Calls.

Just be aware, when you use Assembly Language in a particular OS... it is usually more about the API calls (especially in user interfacing) and less about the Assembly Language programming. Console programming offers the best approach to learn more about the actual Assembly Language since the I/O is rather limited (i.e. printf/scanf). Where you do have control is in algorithm development and other techniques to make your program more efficient. Usually you try to develop the most efficient way to store and manipulate your data, then worry about which instructions you will utilize to do so.
Posted on 2006-09-23 12:46:45 by SpooK