I just looked in the Intel Instruction Set Reference, and according to that, there is a 16-bit relative call-instruction like this:

Opcode Instruction Description
E8 cw CALL rel16 Call near, relative

There is also an identical call-instruction, except that the offset-part is 32-bits instead, like this:

Opcode Instruction Description
E8 cd CALL rel32 Call near, relative

The opcode is the same, but the processor knows which one it is by looking at the operand-size attribute of the instruction (if any).

Anyway, when I look at the disassembly of my program, I see that MASM always generates the 32-bit type instruction, even if the offset fits into 16 bits (or even 8 bits!).

Why is that?

Is there any way to make MASM generate the shorter 16-bit version of the instruction instead? (I guess that one byte will be lost to the operand-size attribute, but I will still gain one byte in space, right?)

If not, is there anyone who can tell me the exact value of the operand-size attribute for the 16-bit version of the instruction, so that I can build it myself with a macro?

Any help would be greatly appreciated.

Posted on 2003-01-13 19:40:29 by dELTA
I'm not sure about the CALL instruction, but normally you don't want to use 16-bit operands in the 32-bit environment if you can avoid it. They may require an operand size prefix (which takes another decode clock), and may stall the CPU while it switches modes. I think Agner Fog talks about it, but I don't have his ref handy at the moment...

Posted on 2003-01-13 20:06:47 by S/390
As well this is not really a "big" area for opomization. Even if there is no pipeline issues, you'd only be saving a handfull of bytes. Since 90% of your function calls are to windows API's you'd need the 32 bit version anyways.

Posted on 2003-01-13 20:10:42 by NaN
Thanks for the info guys.

I do know that it would probably take a little longer to execute it with the 16-bit opcode type, and that it's not such a huge optimization opportunity, but I'm still interested in exactly how this opcode prefix would look, just for fun if nothing else.

So, it would be great if any knowledgeable person could submit an 8-bit post with the value of the opcode-prefix (it would of course also be nice to know how to derive this prefix myself, but I would still settle for only the sole 8-bit value if you can't be bothered with the extra typing). ;)

Posted on 2003-01-13 20:31:21 by dELTA
NaN has a good point. Addresses in CALL are relocated by the linker/loader. Even if the .OBJ code is 00000012, it's gonna be something like 00401012 when it runs. :)
Posted on 2003-01-13 20:50:24 by S/390
I agree with S/390 on this issue, CALL is designed around calling an ADDRESS and in 32 bit code, you call a 32 bit address. In normal code design, saving a BYTE here and there almost exclusively does NOT make the final EXE any smaller, the only people I have seen who continually chase the absolute minimum sized code are those who wish to insert it into someone elses code, IE viral payloads.

Even EXE compressor stub files that truly need to be small do not use this type of code so I don't see the point of it.


Posted on 2003-01-13 23:39:14 by hutch--
Try making a call to a NEAR PTR, and trace though in a debugger. It will be quickly obvious why it wont work.

The way is executes the instruction you will only be able to call addresses between 0x0000 and 0xFFFF.

I wonder about this too once, so I tried :grin:
Posted on 2003-01-14 00:08:03 by ThoughtCriminal
Hmm, very interesting...

So you are all saying that there are not really any relative calls at all in x86 code? My debugger seems to confirm this too (I found the value of the prefix, and added it manually myself).

But why on earth would Intel want to classify this instruction as:

"Call near, relative, displacement relative to next instruction"

in their Instruction Set Reference then, when it's a complete and total lie?! It's really just a "Call near, absolute", and has nothing to do with the next instruction, right?!?
Posted on 2003-01-14 06:39:15 by dELTA
It is relative -- to the current location.

You take the current value of EIP after advancing past the CALL, and add the 32-bit "immediate" displacement value to get the jump address.

Some debuggers will give the calculated destination address, so that you don't need to add the numbers yourself. DEBUG does this for 16-bit code.
Posted on 2003-01-14 15:12:27 by tenkey
The address on the call instruction is relative, but the return address (EIP pointing behind the call) will be pushed onto the stack as absolute address (only a 16-bit absolute address in the 16-bit form). And that makes it virtually impossible to use the small 16-bit version of the call in 32-bit environments


Just made a little test program to verify my theory

.MODEL FLAT, stdcall
option casemap:none

include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib

szWorked db "It worked",0

simpleflatroutine proc
mov eax,12345678
simpleflatroutine endp

main proc

xor eax, eax
call xxx
pop edx
shr edx,16
push dx
db 66h,0E8h ;that's a 16-bit call
dw 0 - distance
distance equ $ - offset simpleflatroutine
.if (edx == 12345678)
invoke MessageBox, 0, addr szWorked, 0, MB_OK
main endp

call main
invoke ExitProcess, 0

END start

This program should have worked according to my theorie. It does not. Simply because the 16-bit call instruction clears HIWORD of EIP.

Posted on 2003-01-14 15:47:41 by japheth
I think it's safe to say that the short versions of CALL are a "left over" from the 16-bit days.

Posted on 2003-01-14 16:55:54 by S/390
too bad highword of eip is cleared. then again, if only a 16bit retaddr is saved,
this probably isn't useful for anything. why is it that size-reducing people must
always be compared with virus writers? there are other mad people in the world,
like <=4k intro writers ;)
Posted on 2003-01-14 17:22:44 by f0dder
Thanks for the info guys!

I have been informed from a friend that the high 16 bits of the EIP are taken from somewhere in the local descriptor table (LDT) for the process, and this needs to be set with privileged instructions? Well, this 16-bit relative call instruction obviously seems to be quite a mess to use in 32-bit programs anyway, so I guess it should better be left alone, but it was an interesting insight in any case. ;)
Posted on 2003-01-14 18:50:33 by dELTA
I tried to see if I could set my program entry point below 0xFFFF, but LINK wont let me do it.

The lowest I could go was BASE:0x0010000. Maybe a FASM user could try to get it lower.
Posted on 2003-01-15 05:23:06 by ThoughtCriminal
program base, iirc, has to be a multiple of 64k.
Posted on 2003-01-15 06:19:52 by f0dder
Well then, I guess near calls are no use to 4k coders under Windows :tongue:
Posted on 2003-01-15 06:48:45 by ThoughtCriminal