Hi everybody,
just starting to learn assembly and having problems with my hello world program(it won't assemble in 'coff' and won't link in 'bin') if I could get some help it would be great, the code is bellow.




Attempted to assemble under nasm with nasm -f coff -o hello.o hello.asm


segment .data
msg db "Hello World of Assembly language!!!", 0 ; Create a string to print

segment .text
global _main ;Required for the linker

_main:
mov dx, msg
mov ah, 9
int 21h
mov ah, 4ch
int 21h
leave
ret


I then used this command to link: gcc hello.o hello.c -o hello.exe


#include "cdecl.h"

int PRE_CDECL _main( void ) POST_CDECL;

int main()
{
  return _main();
}


any help you can give me is greatly appreciated


Sari
Posted on 2010-05-29 18:37:57 by sari
What are you trying to do, Sari? "-f coff", despite being able to use int 21h, is a 32-bit format. You'd want "mov edx, msg", not dx. That'll get it to assemble. There may be an issue with calling your procedure "main" or "_main" - I would suggest calling it "asm_main" or something. That should build and run under DJGPP gcc. My experience has been that the executable will be quite large.

"-f bin" is not a linkable format. Shouldn't need linking, if it's assembled as a ".com" file.. You would need to add "org 100h" (or org "256" if you don't like hex). This doesn't "cause" your program to be loaded at 100h (into whatever segment dos loads it), it merely informs Nasm that it *will* be loaded at 100h (if it's named ".com"). You will need to specify the output file name: "nasm myprog.asm -o myprog.com" (otherwise Nasm defaults to just "myprog"). That should get your message on screen, but may not be what you want to do...

The "leave" and "ret" are redundant - you've exited the program at that point, and they won't be executed. Good thing, because "leave" undoes the effects of "enter", which you haven't got.

If you're not running DJGPP, "-f coff" is not what you want. Get back to us, in that case...

Best,
Frank

Posted on 2010-05-30 02:11:51 by fbkotler
I changed dx to edx and tried it this time it assembles but crashes also, I am learning 32 - bit asm so correct me if I am wrong but ".com" is not an option. Also I don't have DJGPP I have MinGW32 but I think they are mostly compatible (again correct me if I'm wrong. Also changed _main to _asm_main.
Posted on 2010-05-30 22:41:32 by sari
sari,

For what target are you compiling? DOS services can be used from protected mode if you use DOS extender. Plain console PE will fault on int 21h.
Posted on 2010-05-31 00:08:41 by baldr
Okay, you're wrong. :)

The two object formats, DJGPP coff (what Nasm calls "-f coff"), and MScoff (what Windows uses, and what Nasm calls "-f win32"), are quite similar. But DJGPP provides a "runtime" with a 32-bit version of int 21h. It is "32-bit extended dos". Without that, forget int 21h.

You're right that .com is out for 32-bit code. Well, if you start in "real, real mode dos", you can go into 32-bit mode and code 32 bits... but you don't want to. :)

Lemme see...


global _main
extern _printf

section .data
  msg db "Hello, Win32!", 13, 10, 0

section .text
_main:
  enter 0, 0
  pusha

  push msg
  call _printf
  add esp, 4

  popa
  leave
  ret


That's just off the top of my head, and may have errors. I think it's close, see what the CPU thinks. :)

Assemble that as "nasm -f win32 hello.asm". That should give you "hello.obj". Then link it (there's nothing to "compile", really, gcc just calls ld for us) with "gcc -o hello.exe hello.obj". If it complains about not being able to find "_printf", add "-lc" to the command line, but I don't think you need it.

If that works, you're on your way! Well, you can call printf, at least. More like a C program than assembly language, but it's a start...

I cut it down to one program - trying to keep it as simple as possible - but you can call your asm program from a C "main" like you were doing. Dr Paul Carter uses this technique - "driver.c" calls "asm_main" in your asm program - to produce a tutorial that can be used on many platforms. He's got example files for Cygwin, says "might work for MinGW too". I'll bet it does, and if it doesn't, I'll bet we can get it to work! That might be something to look at...

http://www.drpaulcarter.com/pcasm/

There's a post over on the Nasm forum that explains how to cross-compile a file like you've got for Linux and Windows. I consulted it to confirm that the command line to MinGW gcc was what I thought. It might interest you, but... on second thought, maybe not too much...

http://forum.nasm.us/index.php?topic=810.0

If you want to use the Windows APIs directly, rather than letting libc call them for you (which is what happens when you use printf), there's the NASMX package, here:

http://www.asmcommunity.net/projects/nasmx

It uses a different linker, but there may be a way to use it with ld from MinGW. Dunno. I used Cygwin briefly (horrid slow thing) when I was running Windows, but never tried MinGW, so I don't know what it'll do. I suspect that telling it where to find the libraries is the "trick". I think I got Cygwin's ld to work by feeding it Hutch's MASM32 library...

Anyway, see if my attempt above will work for ya, and then see where you want to go from there...

Best,
Frank


Posted on 2010-05-31 00:48:43 by fbkotler
Thanks alot frank that worked perfectly. But a few questions on how that works.

First: msg db "Hello, Win32!", 13, 10, 0
I know 13 is the length but what are 10 and 0 for?

Second: add esp, 4
What are we doing here?
Posted on 2010-05-31 19:20:49 by sari

Thanks alot frank that worked perfectly. But a few questions on how that works.

First: msg db "Hello, Win32!", 13, 10, 0
I know 13 is the length but what are 10 and 0 for?

Second: add esp, 4
What are we doing here?


13,10, and 0 produce a linefeed.

Posted on 2010-05-31 19:54:40 by skywalker
ok so what does add esp, 4 do?
Posted on 2010-05-31 21:06:29 by sari

First: msg db "Hello, Win32!", 13, 10, 0
I know 13 is the length but what are 10 and 0 for?

Second: add esp, 4
What are we doing here?


1. 13 is an ASCII code for CR (carriage return), 10 for LF (line feed), 0 is string terminator (by definition of C strings).

2. That's a part of cdecl calling conventions. Called function doesn't remove its arguments from stack.
Posted on 2010-06-01 00:50:02 by baldr
Right.

1) I perhaps should have written this as


%define CR 13 ; or 0xD, if you like
%define LF 10 ; or 0xA

; perhaps even
%define NL CR, LF
; or for Linux
; %define NL LF

msg db "hello", CR, LF, 0


I don't think we actually need both of them, do we? I used both "just to be sure". We do need the zero - C calls it "NULL", I guess... or "NUL"? Maybe should have "%define"d that, too...

There's a fairly recent Nasm feature... Nasm will do "\n" (and other Cish stuff), if the string is enclosed in "back apostrophes" or "back ticks"... (under the '~'... on my keyboard)


msg db `hello\n`, 0


if you want to do it like that... There are Windows and Linux APIs that require the length to be passed as a parameter, instead of the zero termination, but all(?) C functions and some Win/Lin APIs want it.

2) Might be worth discussing the "other" calling convention - "stdcall" - where the callee *does* remove parameters from the stack, since the Windows APIs use it. In fact, it might be worth discussing "the stack" a little, since beginners don't know about it automatically. The stack is just memory that ss:esp points to. In dos executables, we were expected to declare a stack - "segment stack stack" - or the linker would complain. In 32-bit code, the OS tells *us* where the stack is. We can just start using it.

The "obvious" instructions that use the stack are "push" and "pop". "push xxx" does essentially "sub esp, 4"/"mov , xxx". "pop xxx" does essentially "mov xxx, "/"add esp, 4". (where "xxx" can be a register or an "immediate" number, including an address, or contents of memory, "dword " - we have to say "dword" if it's contents of memory - dunno why, it's - almost - always a dword...)

Less obvious is that the "call" instruction uses the stack. "call foo" essentially pushes the "return address" - the address of the instruction immediately following the call - and jumps to the label "foo:". When "foo" ends with "ret", we essentially "pop" the return address off the stack, and jump there. If esp doesn't point to the return address when "ret" is encountered, we jump off into the tall weeds instead of back where we came from. That's why it's important that somebody - calling function (caller) or called function (callee) - "balance" or "clean up" the stack. This need not be done immediately after every function call - it can be "deferred"...

I used "enter 0, 0" and "leave"... let me rewrite that with more explicit instructions that do the same thing...


  push ebp ; save caller's ebp - they were using it!
  mov ebp, esp
  pusha

  push msg
  call _printf
  add esp, 4

  popa
  mov esp, ebp
  pop ebp
  ret


As you can see(?), the "mov esp, ebp" would have restored esp to its proper value, even if we hadn't done it. The "pusha" and "popa" are overkill, too. The calling convention, besides specifying who cleans up stack, specifies that certain registers are preserved - besides ebp (the caller was presumably using it for the same thing we are), ebx, esi, and edi are expected to retain their values across a call. Since we didn't change any of them (besides ebp), we don't have to do anything here. "pusha"/"popa" is just a quick and dirty way to cover it. It has the disadvantage that we lose the value in eax. The calling convention specifies that the return value is in eax (or edx:eax or top of FPU stack, depending on return type). Since "main" returns "int", I really should have had "xor eax, eax" in there. (zero indicates "no error", generally) Oh, well...

The "stdcall" convention is much the same, except that callee cleans up stack. Windows APIs use it. (this is not intended to be working code!)


_main:

  push 0 ; MB_OK
  push caption
  push string
  push 0
  call MessageBoxA

  ret


We didn't have to "add esp, 4 * 4" (I write it that way to indicate 4 parameters at 4 bytes each). The code for MessageBoxA, which we don't get to see, ends with "ret 16" (or "ret 4 * 4" perhaps), which "removes" the 4 parameters from the stack - same as "add esp, 16" - after fetching the all-important return address from the stack(!).

So that's what the "add esp, 4" is for... doesn't really need to be there at all... :)

Best,
Frank

Posted on 2010-06-01 04:06:46 by fbkotler

I don't think we actually need both of them, do we? I used both "just to be sure". We do need the zero - C calls it "NULL", I guess... or "NUL"? Maybe should have "%define"d that, too...


Actually we do. I/O library distinguishes text and binary mode files on platforms where newline is not a single LF (as in various Unix flavors). As a consequence, under DOS/Windows that code outputs "hello", 13, 13, 10 (stdout by default is opened in text mode).

NULL often refers to pointer of value 0, NUL is the mnemonic for ASCII
The "obvious" instructions that use the stack are "push" and "pop". "push xxx" does essentially "sub esp, 4"/"mov , xxx". "pop xxx" does essentially "mov xxx, "/"add esp, 4". (where "xxx" can be a register or an "immediate" number, including an address, or contents of memory, "dword " - we have to say "dword" if it's contents of memory - dunno why, it's - almost - always a dword...)


It also should be noted that if esp is used in effective address or as a source for push / pop, the value used in address calculation or as a source is esp before the instruction is executed (thus above examples should be written like mov tmp, xxx / sub esp, 4 / mov , tmp). E.g. push copies value on the top of the stack, and pop discards value just below top of the stack (for those familiar with
Forth] code 0.


The "obvious" instructions that use the stack are "push" and "pop". "push xxx" does essentially "sub esp, 4"/"mov , xxx". "pop xxx" does essentially "mov xxx, "/"add esp, 4". (where "xxx" can be a register or an "immediate" number, including an address, or contents of memory, "dword " - we have to say "dword" if it's contents of memory - dunno why, it's - almost - always a dword...)


It also should be noted that if esp is used in effective address or as a source for push / pop, the value used in address calculation or as a source is esp before the instruction is executed (thus above examples should be written like mov tmp, xxx / sub esp, 4 / mov , tmp). E.g. push copies value on the top of the stack, and pop discards value just below top of the stack (for those familiar with
Forth it's DUP and NIP ;)).

There is a difference of push sp handling in pre-80286 processors, though. They use sp value after it was decreased by 2.
Posted on 2010-06-01 06:14:14 by baldr


First: msg db "Hello, Win32!", 13, 10, 0
I know 13 is the length but what are 10 and 0 for?

Second: add esp, 4
What are we doing here?


1. 13 is an ASCII code for CR (carriage return), 10 for LF (line feed), 0 is string terminator (by definition of C strings).

2. That's a part of cdecl calling conventions. Called function doesn't remove its arguments from stack.


So that's how C does that the only time I've used cdecl is as a pointer to a function.

Frank,


Thank you very much for that post It was very informative and I think I understand that now.

Could someone show me  how would I use the parameters to print out formatted text and scanf too?
Posted on 2010-06-01 06:48:23 by sari

So that's how C does that the only time I've used cdecl is as a pointer to a function.


Can you rephrase this sentence? English isn't my native language, I didn't quite understand what you've meant.


Could someone show me  how would I use the parameters to print out formatted text and scanf too?


MSDN, POSIX or ISO/IEC 9899:201x (.PDF, subchapter 7.20.6) can help. For particular usage you may ask here. ;)
Posted on 2010-06-01 08:05:13 by baldr
Yes I can, sorry for the late reply.


I simply did not know how C converted that to assembly.

The other one means that I want to know how to use scanf in assembly and the parameters for printf.


Also just started on paul carters tutorial on PC ASM and decided to try and write a simple program can someone help correct it.

Here it is:



%include "asm_io.inc"

segment .data
endl db " ", 13, 10, 0
same db "The first name = second name", 0
diff db "They are not equal", 0
outp1 db "The first name is: ", 0
outp2 db "The second name is: ", 0
name1 db  "Keith", 0
name2 db "Daniel", 0

segment .text

global _asm_main

_asm_main:

enter 0, 0
pusha

mov eax, outp1
call print_string

mov eax, name1
call print_string

call own
call own

mov eax, outp2
call print_string

mov eax, name2
call print_string

mov ebx, name1
mov eax, name2
cmp ebx, eax
jne not_equal

equal:
mov eax, same
call print_string
call exit


own: ;prints a newline
mov eax, endl
call print_string


not_equal:
mov eax, diff
call print_string
call exit

exit:
popa
mov eax, 0
leave
ret
















;What It's supposed to do in C.

;int main() {
; char szFirstName[25] = "Keith";
; char szSecondName[25] = "Daniel";
;
; printf("The first name is: %s\n", &szFirstName);
; printf("The second name is: %s\n", &szSecondName);
;
; if (!strcmp(szFirstName, szSecondName)) {
; printf("They are equal\n");
; } else {
; printf("Not equal\n");
; }
;
; return 0;
;}


Here is my current output:

The first name is: Keith
They are not equal


Then it crashes.

Thanks
Posted on 2010-06-04 19:04:01 by sari
The main problem seems to be that you "call own"... but "own" doesn't return!

Put a "ret" at the end of this subroutine, and you'll get a little farther. There are a couple of good places to put your subroutines - in the middle of your code is not one of them (although it'll work in this case). I prefer to put 'em at the end of my code, after the "exit". Many programmers put their subroutines first, and leave "main" for dessert :) After "section .text" but before "asm_main" would be good. The idea is that you want this code to execute *only* when it's called - don't "fall through" into it (usually).

"call exit" is an error, too. The "key" here is that "call" puts the return address on the stack, and "ret" removes it, and goes there. If esp points to something other than the return address when "ret" is executed, we go someplace else instead. This almost never works!

I told you something incorrect earlier. I said that the "add esp, 4" wasn't really neccessary, that "mov esp, ebp" (or "leave") would "save our asm". Well, it would have, if we hadn't altered ebp. We didn't explicitly alter ebp, but we outsmarted ourselves with "pusha"/"popa"! That's a "quick and dirty" way to preserve the registers that need to be preserved (ebx, esi, edi) - if we alter them - and a bunch of registers that don't need to be preserved... including ebp! (ebp *does* need to be preserved, but we did it separately) This would not be a problem if esp had pointed to the same place as it did when we "pusha"ed... but with "call exit", there's a return address on the stack that "popa" wasn't expecting, so all the registers are filled with the wrong values. ebp is the real killer, but... it's just plain wrong... So just use "jmp exit", not "call exit". (and do use "add esp, 4" or whatever is correct)

With those problems fixed, I think your code will run to completion, and correctly report that the names are not equal. It will report that they're not equal even if they are, though. Your comparison routine is comparing the addresses of the two names, and since they're stored at different addresses, they'll never be equal! You want to compare the "" of memory... "cmp , " won't work - we only have one address bus - you'll have to:


                mov ebx, name1
mov eax, name2
compare_byte:
                mov cl,
                cmp , cl
jne not_equal

; equal so far, try next byte
                inc eax
                inc ebx

;                jmp compare_byte
; oops, not yet - maybe we're done?

                cmp cl, 0 ; zero terminated string, yes?
                jne compare_byte
; they are equal...


The "cmpsb" instruction compares the byte at with the byte at , and increments both esi and edi. In conjunction with the "rep" prefix (holding the string length), it can be used to code a very short string compare. I think it's easier to understand the "naive" version, but you might want to try it that way, too...

For usage and parameters to printf and scanf... the net must be full of C references... (asm_io.asm might provide examples, too) One thing I've found out: printf thinks all floats are doubles, even if it says "%f"! Dunno if this is true of scanf. The parameters to scanf want to be *addresses* of the string, int, float, whatever. K&R says this is "the most common error"! I think this fact may be a little more intuitive in asm, actually. I consider scanf to be an "unruly beast" (and less printable things), and avoid it at all cost. Dr. Carter's "read_*" functions use it. See his "array1c.c" for an attempt to get it under control. Works okay if the user behaves...

In the course of looking at your program, I commented out a couple of "suspicious" instructions, and it segfaulted without printing anything. Seemed to be getting *less* far before it segfaulted. But this was just printf messin' with me. printf does "buffered i/o" (this is a Good Thing) and doesn't actually print anything until the buffer is flushed. Printing a newline will flush the buffer, as will reading buffered input, as will exiting the program. But if we segfault before printing a newline, nothing at all gets printed. Something to keep in mind if you're getting "mysterious" results from printf. I never would have known that, without making errors! :)

You might want to try fixing the problems in your code one at a time. First, just put a "ret at the end of the "own" routine. Just a matter of "style", but I really would move it to "first" or "last", even though it'll work where it is. Then get rid of the "call exit". Then tackle the "compare" routine - you might want to make that a subroutine, too...

Maybe you can answer a question for me. In your first posted code, you used the "PRE_CDECL and POST_CDECL macros. I'm ASSuming that those simply keep C++ from "decorating" (mutilating) the name. Do you know if that's correct?

Best,
Frank


Posted on 2010-06-04 23:52:02 by fbkotler
Thank you very much frank that was very informative I'll post It when I get it to work ;)

Not completely sure on the PRE_CDECL I haven't used them very much before learning assembly but that would make perfect sense I'm sure it adds it's little decorations to the code maybe try disassembling a piece of code with them and one with out to see the difference. I'll get back to you on that one.

Edit: looks like it may just allow C to prototype it outside of the '.c' file itself.

BTW C Seperates floats and doubles with scanf ran into this a couple months back.
Posted on 2010-06-05 01:11:20 by sari
Here is the remastered version of our code:


%include "asm_io.inc"

segment .data
endl db " ", 13, 10, 0
same db "The first name = second name", 0
diff db "They are not equal", 0
outp1 db "The first name is: ", 0
outp2 db "The second name is: ", 0
name1 db  "Keith", 0
name2 db "Daniel", 0

segment .text

equal:
mov eax, same ;print our 'same' string
call print_string
jmp exit

not_equal:
mov eax, diff ;print our 'different' string
call print_string
jmp exit

newl: ;previously own
mov eax, endl
call print_string
ret

exit:
popa
mov eax, 0
mov ebx, 0
leave
ret


global _asm_main ;Required for linker

_asm_main: ;Main routine

enter 0, 0 ; Enter the program
pusha

mov eax, outp1 ;print out our first name string
call print_string

mov eax, name1 ;print out our first name
call print_string

call newl
call newl
call newl

mov eax, outp2 ;print our second name string
call print_string

mov eax, name2 ;print our second name
call print_string

call newl
call newl
call newl

mov ebx, name1 ;move then compare our names
mov eax, name2
compare_byte:
                mov cl,
                cmp , cl
jne not_equal

                inc eax
                inc ebx

                cmp cl, 0
                jne compare_byte

jmp equal


















;What It's supposed to do in C.

;int main() {
; char szFirstName[25] = "Keith";
; char szSecondName[25] = "Daniel";
;
; printf("The first name is: %s\n", &szFirstName);
; printf("The second name is: %s\n", &szSecondName);
;
; if (!strcmp(szFirstName, szSecondName)) {
; printf("They are equal\n");
; } else {
; printf("Not equal\n");
; }
;
; return 0;
;}


I think next will be user enters three numbers and I will order them from greatest to least.
Posted on 2010-06-05 01:58:50 by sari
Okay, equal, not_equal, and exit aren't what I'd call subroutines... but it'll work...

Best,
Frank

Posted on 2010-06-06 12:18:01 by fbkotler