.model small
.stack 100h
.data

num1 DW 127
num2 DW 55
msg DB 'The NULL message', 0
sum DW ?

.CODE
main proc
   .STARTUP
inst1: push number1
inst2: push number2
        call donothing
done:
.EXIT
main ENDP

We assume the following registers have the following values: CS = 0F00, SS = 0C00 and DS = 0E00

Now I need to know the physical address (in hex) of number 2. The back of the book says 0E002 which clearly goes with DS. I know how we got 0E002 ((Segment value * 16) + Offset value) but I don't know why we're looking at DS for number2, what is the connection?

Can somebody please explain this? Thanks.
 
Posted on 2009-10-23 20:43:56 by dre
why we're looking at DS for number2, what is the connection?


DS = Data Segment
Posted on 2009-10-23 21:25:06 by JimmyClif
If you could disassemble your code, or look at it in a debugger, you'd see that the ".STARTUP" macro expands to something like:

mov ax, data
mov ds, ax
mov es, ax ;??? dunno if it does this or not

That will explain how ds gets to point to your data segment - it isn't "automatic" (commonly, the OS takes care of this for us, but not in MZ executables). Addresses generally - there are exceptions! - default to "ds:something" or "ds:"...

Best,
Frank

Posted on 2009-10-24 09:31:51 by fbkotler
So let me get this straight...anything under the data segment is covered under DS? For the same example would the physical address of num1 be 0E000 ? Suppose we were asked the physical address of inst1, what register would we refer to for that and why?

Thanks again.
Posted on 2009-10-24 13:16:16 by dre
anyone? lol
Posted on 2009-10-25 20:47:59 by dre
I guess so. The physical address for num1 would be at offset 00 as it's the first word at the beginning of the data segment. According to the specs you mentioned at the beginning DS = 0E00 it would be 0E00+00=0E00.

For the code segment same rules apply. Although different instructions compile into different byte/word/dword patterns. A push is usually a few bytes - the byte code for push and the address to push.

Disassemble it and have a peek :)
Posted on 2009-10-25 21:23:11 by JimmyClif
Sorry, I missed that one...

The CPU executes instructions at cs:ip (or eip or rip). I mentioned that ds may not point to our data segment unless we (or the OS) make it so. We can assume that cs points to our code segment - or we'd be executing some other code!

However, that ".STARTUP" macro generates some code. Best disassemble it as JimmyClif suggests. (or look at a "list" file... How's Masm do that? "/l myfile.lst"?) Or RTFM and see what that macro expands to. In spite of the fact that "inst1" is the first instruction "showing", I don't think it's the first instruction...

Best,
Frank

Posted on 2009-10-25 21:38:05 by fbkotler

So let me get this straight...anything under the data segment is covered under DS? For the same example would the physical address of num1 be 0E000 ? Suppose we were asked the physical address of inst1, what register would we refer to for that and why?


Segments are two-fold: there are compiler/linker segments and CPU segments (we're talking about real-address mode, right?). Compiler and linker use segments to arrange pieces of code/data in particular order to build executable image, CPU uses segment registers to access these pieces.

Let's examine your (slightly modified) source MASM listing:


.model small
.stack 100h
0000 .data

0000 007F num1 DW 127
0002 0037 num2 DW 55
0004 54 68 65 20 4E 55 msg DB 'The NULL message', 0
      4C 4C 20 6D 65 73
      73 61 67 65 00
0015 0000 sum DW ?

0000 .CODE
0000 main proc
   .STARTUP
0000   *@Startup:
0000  BA ---- R   *    mov    dx, DGROUP
0003  8E DA   *    mov    ds, dx
0005  8C D3   *    mov    bx, ss
0007  2B DA   *    sub    bx, dx
0009  D1 E3   *    shl    bx, 001h
000B  D1 E3   *    shl    bx, 001h
000D  D1 E3   *    shl    bx, 001h
000F  D1 E3   *    shl    bx, 001h
0011  FA   *    cli    
0012  8E D2   *    mov    ss, dx
0014  03 E3   *    add    sp, bx
0016  FB   *    sti    
0017  FF 36 0000 R inst1: push num1;;; was number1
001B  FF 36 0002 R inst2: push num2;;; was number2
001F  E8 0004         call donothing
0022 done:
.EXIT
0022  B4 4C   *    mov    ah, 04Ch
0024  CD 21   *    int    021h
0026 main ENDP

0026 donothing PROC
0026  C2 0008 ret 8
0029 donothing ENDP

END


As you can see, .STARTUP directive expands to some code that loads ds with the value of DGROUP symbol (and fiddles with ss:sp). DGROUP is segment group, .MODEL SMALL directive creates it containing _DATA (from .DATA) and STACK (from .STACK) segments (listing won't tell you that, just believe me ;-).


Segments and Groups:

               N a m e                 Size     Length   Align   Combine Class

DGROUP . . . . . . . . . . . . . GROUP
_DATA  . . . . . . . . . . . . . 16 Bit 0017  Word  Public  'DATA'
STACK  . . . . . . . . . . . . . 16 Bit 0100  Para  Stack  'STACK'
_TEXT  . . . . . . . . . . . . . 16 Bit 0029  Word  Public  'CODE'


Now it's the fun part: linker builds .Exe placing segments in the following order: _TEXT segment (it's your .CODE), word-aligned _DATA segment because num1 and num2 are referenced from _TEXT, STACK segment (this is the catch: _DATA grouped with STACK, stack is unitialized data, the only way to put unitialized data in MZ .Exe is to stuff it behind EOF ;-). _TEXT is 29h bytes in size, so the next word-aligned offset is 2Ah, 2:0Ah in seg:off-speak. Hence mov ds, DGROUP will become mov ds, 2 (and relocation record in .Exe header), and push num1/push num2 will become push [000Ah]/push [000Ch] respectively.

Isn't MASM simplified segment control somewhat oversimplified? Let's try full-blown segmentation:


0000 _DATA SEGMENT PARA PUBLIC USE16 "DATA"
0000 007F num1 DW 127
0002 0037 num2 DW 55
0004 54 68 65 20 4E 55 msg DB 'The NULL message', 0
      4C 4C 20 6D 65 73
      73 61 67 65 00
0015 0000 sum DW ?
0017 _DATA ENDS

0000 _TEXT SEGMENT PARA PUBLIC USE16 "CODE"
0000 main proc
0000  B8 ---- R mov ax, _DATA
0003  8E D8 mov ds, ax
ASSUME ds:_DATA
0005  FF 36 0000 R inst1: push num1;;; was number1
0009  FF 36 0002 R inst2: push num2;;; was number2
000D  E8 0004         call donothing
0010 done:
0010  B4 4C mov ah, 4Ch
0012  CD 21 int 21h
0014 main ENDP

0014 donothing PROC
0014  C2 0008 ret 8
0017 donothing ENDP

0017 _TEXT ENDS

0000 STACK SEGMENT PARA STACK USE16 "STACK"
0000  0100 [ db 100h dup (?)
       00
      ]
0100 STACK ENDS

END main


Looks similar, but links different: _DATA goes to beginning of the image, mov ax, _DATA will be mov ax, 0 (+reloc) and num1/num2 offsets will be 0 and 2.

The crucial point is the ASSUME ds:_DATA directive: it tells assembler that symbols in _DATA segment can be accessed via ds. Look at this:


0000 _DATA SEGMENT PARA PUBLIC USE16 "DATA"
0000 007F num1 DW 127
0002 0037 num2 DW 55
0004 54 68 65 20 4E 55 msg DB 'The NULL message', 0
      4C 4C 20 6D 65 73
      73 61 67 65 00
0015 0000 sum DW ?
0017 _DATA ENDS

0000 _TEXT SEGMENT PARA PUBLIC USE16 "CODE"
0000 main proc
0000  B8 ---- R mov ax, _DATA
0003  8E D8 mov ds, ax
ASSUME es:_DATA;;; here I'd changed ds to es
0005  26: FF 36 0000 R inst1: push num1;;; notice es: segment override prefix? It's 26:
ASSUME ds:_DATA;;; now _DATA is addressable thru ds too
000A  FF 36 0002 R inst2: push num2;;; no prefix, as expected
000E  E8 0004         call donothing
0011 done:
0011  B4 4C mov ah, 4Ch
0013  CD 21 int 21h
0015 main ENDP

0015 donothing PROC
0015  C2 0008 ret 8
0018 donothing ENDP

0018 _TEXT ENDS

0000 STACK SEGMENT PARA STACK USE16 "STACK"
0000  0100 [ db 100h dup (?)
       00
      ]
0100 STACK ENDS

END main


Examine carefully, it's almost self-explaining. I didn't even set up es before ASSUME ;-).

In hypothetical situation (as if MASM can assemble absolute code/initialized data and loader can appropriately place them in memory) mov ax, _DATA will become mov ax, 0E00h.

Are you getting the gist of it? I can explain it further, but conclusion is simple: "Set up segment register and tell compiler about that".
Posted on 2009-10-26 05:11:38 by baldr