Before starting to learn bitfields in opcode blocks,
and discussing what advantages and disadvantages
wait us in those field usage, I find it usefull to train
memorizing one of most used and important bit field - reg field.
This field is used in modr/m block, sib block and also
inside several 1 byte opcodes.
Reg bit field has 3 bits and therefor can contain
2^3=8 possible values reffering to 8 common use registers.

000 = EAX
001 = ECX
010 = EDX
011 = EBX
100 = ESP
101 = EBP
110 = ESI
111 = EDI

Reg fild also can mean 1 different set of registers - partial registers.
We discuss when it happens later, for now about second set of meaning
enough to say that first 4 values reffer to low partial registers in the same
order as they reffer to full registers:
000 = AL
001 = CL
010 = DL
011 = BL
and the second 4 values reffer to high part registers in the same order
100 = AH
101 = CH
110 = DH
111 = BH

We firstly learn situation where full registers only can be used - in one byte opcode
with REG field.
In such an opcode high 5 bits are CODE bitfield
and low 3 bits are REG bitfield.

I wrote a simple training application to train decoding\encoding 1 byte opcodes
with reg field.

Soon later I include detail artical.
Hope you find the use of the app more productive then trying to memorize
all this bitfields by simple looking at some documentation text
Let me know if you want source.
Posted on 2003-01-31 19:45:21 by The Svin
Before, in opcode articals, it was enough to use OllyDbg
to practice with discussed opcode issues.
I was planning to go to most important issue regarding
opcode - addressing.
But that issue involves understanding bit fields size of
wich in bits not multiple by 4.
Format of two most important blocks: byte moder/m and byte sib
has bit fields that are not multiple by four, and that means
that you can in your brain associate some bitfield with
one or more hex digits.
Their format in bits is 2:3:3.
That means that 2 highest bits in byte means one thing, next 3
other thing, and next 3 the third.
But when you see it in hex you see two hex figures.
Each of the figures constructed from bits of two close bitfields.
Each figure in hex associated with nibble - 4 bits.
So fields 2:3:3 in hex include 2 bits from first field and two bits
from second field in the high nibble and last bit from second field
with all 3 bits of third field in low nibble.
for example value in byte moder/m that says that both operands
are register and the registers are edx edi
in hex is F9 (for example mov edi,ecx is 8B F9,
sub edi,ecx is 2B F9, pay attention to the second byte of
both opcodes - it's the same and it's modr/m byte)
in bits separated in nibbles is 1111 1001
and actualy in bit field format is 11 111 001
were 11 - mod (11 value in mode means both operand are registers)
111- reg (111 stands for edi in reg field)
001- reg or mem indirect reg pointer(001 stands for ecx in reg field)
more to say that the same meaning of modr/m(both operands are registers
and the registers are edx edi) could be coded as 11 001 111
in nimbles 1100 1111 in hex CF wich make it all more complicated to understand.
Well, I hope I've said enough to spook you out of reading and you probably
decide that all this mumbo-jumbo not for you.

For the rest who are still reading I continue :)
A few words of importance to be able construct and extract bit
fields, and importance of understanding addressing in opcode can't
list even 1% of advantages you can get in coding knowing it.
But I try.
To construct ANY address (in byte modr/m and byte sib) all you need to
know is meaning:
1. 8 possible values in reg field
2. 4 possible values in mod field in byte modm/r
3. 4 possible values in S field in bytes sib.
Then you need to know a few simple general rules, and few exeptions.
Of course if you working with HEX you also need fluently extract \ construct
bit fields from/in HEX. We'll eventualy learn it step by step, as additinal
bonus it'll give us ability to train something of position systems and
arithmetics wich is extrimly usefull for those who create and optimize
algorithms in low level.
Would you try memorize it in HEX you need 2^10 values :)
(256 for modrm + 3 * 256 in )

Now about importance of knowing addressing construction in opcode,
after we are finished you'll be able determing size of any opcode
in a fly when typing mnemonics. Addreesing part in opcode is most size
consuming, we always deal with operands and that means we always deal
with specifying operand in other words - their addressing (including
registers). Exeption is mnemonics where operands predefined:
for example chain opcodes, that also explain why those opcode are
short - they don't have address part.

Without knowing opcode addressing part, it's practically useless to
figure out (looking at just mnemonicd) wich code is shorter or what is
size of opcode.
Example for those who don't know address coding.
Look at two different codes that do the same:
1. mov edx,

2. xor eax,eax
mov edx,

Looking at mnemonics it seem that the second code has more lines,
and even last line alone has more blocks, characters, etc. and therefor
seems to reffer to longer opcode.









It's an illusion :)
The second version opcodes is 2 bytes shorter then the first:

8B 14 8D 00 00 00 00 MOV EDX,

33 C0 XOR EAX,EAX
8B 14 88 MOV EDX,

I don't think that Privalov or betov or any of those who know specifics
of opcode address part would hesitate a second about right answer.
All of us who don't know right answer about what size addressing part
takes and want to know it in the future, will be able to do it in short
while after some not very hard work.

About encoding bit fields: though it is not so easy in hex with formats
5:3 or 2:3:3, it is easy while coding in asm source in bynary.
For example code field in "inc reg" instruction is 01000 and then
3 bit field of reg.
so eax=000 then inc eax is 01000 000.
The only problem here is that assembler does not accept if we write
space between two bit fields, it needs us to write 01000000b and
such a way leads to look that again is hard to see fields in.
We can write a simple macro that could help us easily write bit field with separator.
We give our macro name bcr wich states for "a Byte opcode with Code and Reg
fields"
bcr macro _code,_reg
db _code&_reg_&b
endm
now we can write instead of db 01000000b
bcr 01000,000 ;inc eax
bcr 01000,001 ;inc ecx and so on...

bcr 10010,000 ;xchg eax,eax or nop
bcr 10010,001 ;xchg eax,ecx
....
bcr 10010,111 ;xchg eax,edi

the same way we can do very simple macros that would help us
code modr/m and sib bytes wich have bit fields format 2:3:3
modrm macro _mod,_regcode,_rm
db _mod&_regcode_rm&b
endm
and 'cause byte sib has the same format we can just define
bsib equ modrm
now can code seeing easily all filds value:
modrm 11,000,101

for example code for mov reg1,reg2 is 8Bh and after it is following
byte modr/m where in mod field 11.
examples:
db 8bh
modrm 11,000,001 ; mov eax,ecx (eax=000,ecx=001)
db 8bh
modrm 11,111,001 ; mov edi,ecx (edi=111)
db 8bh
modrm 11,001,111 ; mov ecx,edi
I gave examples just to show that there is no problem to code
bitfields in bynary.
In details we discuss byte modr/m later.

For now enough with bitfields importance propoganda :)

Let's actually start studing meaning of those fields.

We start studying from REG field using it in 1 byte opcodes
that have format 5:3 (5 upper bits for code field and 3 low bits
for register field)
We'll do it for several reason.
1. Why start with REG field?
Reg field is most used field in addressing, getting used to
values in it makes a lot easier going further with opcodes addressing
parts. Reg field can be used in 2 fields of byte modr/m and 2 fields
of byte sib. Bytes modr/m and sib are used for specifying address in
overwelming majority of opcodes. They altogether have 6 fields and
in 4 of them is place for reg field values.
2. Why start with 1 byte opcodes of CODE:REG 5:3 format.
First of all it is the simplest format that uses bit field that is not
multiple by 4.
So we start getting used to such a format in both binary and hex and
we start with easyest of them, so after we got used to it we can easier
proceed with 2:3:3 format.
In addition to it in 5:3 format there is only one operand wich is also
makes decoding\encoding easier.

Start training programm regfield.exe.
It has 3 tabs.
First tab called reference and serve you to generate 1 byte opcodes
in two formats: bynary(where bits separated in two bit fields: code (5 upper
bits) and reg(3 lower bits)), and hex with 2 hex digits.
You can see current opcode on your right.

You can see buttons that are for choosing instructions and operands.
I explain something that could serve you as additional hints when you go
to testing(decoding\encoding) parts.

Those buttons set in order of growing code values of instructions they are
reffering to.
Look at first column - instruction column.
Click buttons of this column from up to down looking and code bit field.
As you can see they reffer to CODES:
INC - 01000
DEC - 01001
PUSH- 01010
POP - 01011
-----------
XCHG -10010

first for start from 01000 and go in step 1 increasing order
from 01000 to 1011
The last button(xchg) code though has a "gap" between next to it upper
button(pop) and is not 01100 but 10010. You may memorize it as exeption.
As to the code xchg itself in my time I memorized it taking note
that upper four bits are mirrowing bits(1001) and of course in hex
it's very easy to remember that it makes first digit 9.
(as the biggest decimal digit or as first digit in opcode "nop"
wich is originaly is opcode for xchg eax,eax = nop)

the same about reg operand buttons -
they placed in growing codes order
the first column from 000 to 011 codes
second column from 100 to 111 codes

Play with it and pay attention how value of last bit in code field affect
second hex digit - if it's 0 the last hex digit = reg field value,
if it's 1 = the last hex digit = reg field value + 8. 'Cause last
bit in code value is bit 3 in low nimble of byte and bit 3 = 2^3 = 8

...to be continued.
Posted on 2003-02-01 14:24:28 by The Svin
Can you create some slogan to memorize growing order
of registers?
First try on aicent Athlantida language:
A(ei) C(see) D(dee) B(bee) SuP BuP SI DIe
(eAx,eCx,eDx,eBx,eSP,eBP,eSI,eDI)
:)
which in free translation means:
Hei, Look! a Bee felt dead in your soup!

Can you offer something better?
Posted on 2003-02-01 15:37:06 by The Svin
I made changes to the app.
Now it displays test results.
Posted on 2003-02-02 13:30:43 by The Svin
A few more words about the testing programm.
The first tab is kinda a small and premetive assembler,
wich knows only five instructions, and use of
them with only full general purpose registers, but it doesn't
encode machine code from source text - it encodes it from the buttons pressed.
The other two tabs aske you to be:
- Disassembler (tab "tell mnemonic")
- Assembler (tab "tell opcode")

In tab "tell mnemonic" dialog you can see the same interface as in "reference" tab: buttons with instructions and registers on your left and opcode in bynary
(where bynary byte groupped not 4:4 but 5:3 giving
you more hints of how to understand it)
and HEX.
After buttons from both groups (instructions and registers)
are pressed, programm checks if you correctly disassemle(decode)
opcode and if you are right - it writes "correct" msg and generates a new
opcode for you to decode, if you are wrong - it informs you of
it, depresses buttons and waits for your new try.

Little hints to say:
if the first hex digit 4 - it's inc or dec reg
if the first hex digit 5 - it's push or pop reg
if 9 - it's xchg eax,reg
which one of 2 possible opcode it is, seen whether the second hex digit >=8.
If you still have problems with binary to hex convertion in a fly
there is thread "test yourself for fast hex2bin convertion"
where are several training programms for hex to bin and dec to bin
convertions, along with some explonation of "mind alogoriths".

The last tab "Tell opcode" asks you to encode (or assemble) mnemonics.
It asks you to encode mnemonics both in bynary and hex.
It's important for us to do it in both systems, we need to
learn to "see" bit fields in hex opcode.
To encode in bits press buttons, in hex - write 2 hex digits in low black
edit window. You can press test button after that or
space while in hex edit window.

The least meaning of it is to memorize those opcodes, though, I think, it's
usefull to memorize that those extrimly used opcodes are 1 byte, main
purpose it fuenlty work with reg field, this field as you will see
used in addressing and some value in some combination might mean something
else than register, knowing well wich value to wich register reffer
will help without additinal work to know what size of opcode you get
in some addressing version with some particular registers.
This will help you not only while you coding in hex but first of all in
coding in asm mnemonics when you chose one or another register.

Decode\encode - untill you have at least 100 right answers (in last
version you can see your results). In couple days you wouldn't ever forget
bit values for the register nor opcode and size for instructions used
in the training programms.

IMHO though, you may say that you learn it well
if you can get 100 right results without any wrong in no more than 5 minutes
time.
Posted on 2003-02-02 18:41:43 by The Svin
inc dec push pop xchg eax,reg
are the only 1 byte opcodes that use arbittrary reg operand.
But it is not the only opcodes that use the only one reg operand.
For example mnemonic bswap reg reffer to 2 bytes opcode:
00001111:11001reg
first byte 00001111(0Fh) has the only purpose to
tell processor that the following opcode is from "new"
instruction set. Remember this byte, you can it in other
opcode as the sign.
Actuall "usefull" part for processor here is also 1 byte
in format 5:3. Where last part is known for you reg field.
So
bswap eax = 0FC8 (11001 000)
bswap ecx = 0FC9 (11001 001)
....
bswap edi = 0FCF (11001 111)

You can also use your ability to decode\encode reg field
on opcode last bit field in opcodes that uses actually modr/m
but have just one reg operand.
Example for it MUL REG, IMUL REG.
But first let's make clear about 16 bit registers.
I'm almost sure that all of your remember havilly discussed
prefix 66h.
All opcodes that reffer to 16 bit registers absolutly
the same as opcodes that reffer to extended registers
exept that in 32bit mode they have prefix 66h leadig opcode.
So
INC EAX = 40h INC AX = 66h 40h
DEC ECX = 49h DEC CX = 66h 49h
...
BSWAP EDI = 0F CF BSWAP DI = 66 0F CF
Posted on 2003-02-02 19:30:08 by The Svin
Let's discuss partial registers.
We take for example MUL,IMUL instructions, their
formats are:
MUL 1111011w:11 100 reg
IMUL 1111011w:11 101 reg
First of all we can see that 1st bytes of both
instructions are identical.
And we can see symbol "w" on the least significant
bit place. Depending on its value in hex first byte
will look as F7 (w=1) or F6 (w=0).
Discussing partial registers set we are most interested
on this particular bit.
If w = 1 it says processor that value in reg field
need to be decoded as full register
if w = 0 - reg field means partial register.


w = 1 w = 0
reg value reg
EAX 000 AL
ECX 001 CL
EDX 010 DL
EBX 011 BL

ESP 100 AH
EBP 101 CH
ESI 110 DH
EDI 111 BH

We have 8 general purpose registers but only 4 of them
have 2 partial registers each, it gives purfect opportunity
to code 8=2*4 both partial and full registers with the same
3 bit values and destinguish wich set is used by value of w bit.
You might ask yet we use additinal bit why they didn't give
partial and full registers different values using 4 bits -
in the case they would have had 16 different values - enough
to encode all 16 registers both partial and full.
It's good question - the answer is that value of bit w affect
only register meaning as register operand and doesn't affect
if register as pointer, and pointer can be only full register.
Now in addressing like reg,
all for reg fields are used and there is the only one operand
that can be partial register. Where register used as register
or as indirect pointer, code specifying register is the same
there are fields like mode which specify wether register in
reg field is pointer or register.
In current encoding system all this 4 fields takes 3*4=12 bit
and can be included in two bytes along with
fields scale and mod in them. Would they give 4 bits for reg
field the only reg fields had taken 16 bits and for scale
and mod they would need to find additinal space increasing
size of every instruction.
Posted on 2003-02-02 20:39:54 by The Svin
Nice little app, I like it. It's fun and educational (in conjucntion with your "opcode threads" :)).
I'll read the entire thread in more detail as soon as my head stoops acing, which I hope is soon.
Posted on 2003-02-03 12:09:13 by scientica
Thank you,scientica.

I continue.

Now you may understand how assembler encode opcode looking
at statments in our source like:

byte ptr ...
word ptr ...
dword ptr ...

In most cases(with 32 bit default size) if:
dword ptr - assembler sets bit w in opcode to 1
word ptr - also sets bit w to 1 and add prefix 66h before opcode
byte ptr - sets bit w to 0

(Again take notion of opcode size using 16 bits operands - it always
leads to additinal 66h prefix byte)

Back to MUL/IMUL reg opcode:


MUL 1111011w:11 100 reg
IMUL 1111011w:11 101 reg

Bytes here separated by column.
The only difference you can see in second byte and only
in the middle 3 bits field of it.
A little introductory about "nature" of this byte.
The second byte is so called in documentation syntax byte "modr/m".
It has 3 bit fields in format 2:3:3 bits.


bits 7,6 :mod
bits 5,4,3:code or reg
bits 2,1,0:mem or reg

Before further discussion insert several instructions
in OllyDbg of any of following formats:


mov reg,reg
sub reg,reg
add reg,reg
and reg,reg

It all be 2 bytes opcodes and the byte (byte[1]) of
any those opcodes is byte modr/m
For better illustration insert several different instructions
but with the same reg operands.
For example:


mov ecx,edi
add ecx,edi
sub ecx,edi
....

You probably can see the same 2nd byte in opcode of all this
instructions (I've said probably, 'cause it's possible to
encode instructions of such a format in two ways - soon
you'll understand why)

take second byte of any of these opcodes and convert it to bynary
format.
Now separate 8 bits in parts 2:3:3 (** *** ***)
In two last 3bits fields you can see codes for used in your
mnemonics registers.
for example in case of mov ecx,edi
8BCF MOV ECX,EDI
the second byte CF in HEX, in bynary 1100 1111 separated in 2:3:3
11 001(ecx) 111(edi)

In this example second field (code or reg) is used to specify reg.
But not always, with some codes it used as extentinal bit of code
bit field.
As it is in MUL/iMUL reg opcodes.



MUL 1111011w:11 [100] reg
IMUL 1111011w:11 [101] reg

Along with 1111011w 100 extention means MUL, 101 - IMUL.
Remeber that all my words would mean almost nothing to
you without practice:
Code in OllyDbg in hex several instructions of MUL/IMUL,reg
format.
Using as the first byte F7(bit w = 1) leads to treating reg field
as one from full registers set,F6(w=0) - from partial registers set.
Using values in second byte (E*) in last hex digit < 8 means
that last bit in code/reg field will be 0 and field itself = 100
that will create MUL instruction and last hex digit will be = regfield value.
using values >= 8 sets last bit in code/reg fild to 1 (101 in code/reg field)
and that will created iMUL instruction and value of last hex digit will be
value of regfield + 8.
Don't be afraid - it's difficalt to discribe by words but easy to see
in debugger, if you get dioriented - read this again, and try practice back
to OllyDbg.

After you feel comfortable with encoding MUL/iMUL,reg - try simular
instructions DIV/iDIV reg.
Their formats are:


DIV 1111011w:11 110 reg
iDIV 1111011w:11 111 reg


If you are carefull low level coder you've noted that they not only
almost identical to each other but aslo almost identical to MUL/iMUL.
It's thrue indeed.
Difference between the four opcodes formats only in cod\reg field.
This field specifys wich one of the four (MUL,iMUL,DIV,iDIV) operations
is used.


code/r instruction
100 MUL
101 iMUL
110 DIV
111 iDIV

It reffers not only to MUL,iMUL,DIV,iDIV reg
But also to MUL,iMUL,DIV,iDIV
Insert in OllyDbg:


MUL EBX
iMUL EBX
DIV EBX
iDIV EBX
MUL [EBX]
iMUL [EBX]
DIV [EBX]
iDIV [EBX]

Take the second byte of each opcode.
Convert it to bynary, separate bynary digits in format 2:3:3 (** *** ***)
and compare difference.
For example in such a format:


Second byte instruction
11 100 011 MUL EBX
11 101 011 iMUL EBX
... and so on.

Do it by your own hands.
It is much usefull than staring at my text ;)
Much better if you can write training program that formalize
this system with MUL,iMUL,DIV,iDIV.
Simular to what you could see in this thread, with refference
in first tab and training to decode\encode in 2 other tabs.
Might be it'd be fun for you and become first step in writing
your own assembler\debugger.
I wish you luck.
Posted on 2003-02-03 13:13:38 by The Svin
I took my self the liberty (I hope you don't mind Svin) of translating the macros to fasm syntax (I took me a while, since I RTFM, but now I've learned yet anothering to day :))
The macros are used just as the orignal ones.
macro bcr _code, _reg

{
db _code#_reg#b
}

macro modrm _mode, _reg, _rm
{
db _mode#_reg#_rm#b
}

macro bsib _scale, _index, _base
{
db _scale#_index#_base#b
}
Posted on 2003-02-04 11:49:09 by scientica
I hope you don't mind Svin

Of course, I don't :)

Fist of all for the future exersizes I'd recommend
use Hiew. You may of course use OllyDbg or any
other tool that has assembler\disassembler, but many
things involving hex coding with ability in a fly
to see "mnemonics results" is more comfortable to do
it hex editor such as "Hiew".
For those who read "Opcode" articals from the begining
and have premade "testopcode" app with nopes.
If you hardly working on opcode studing and often
need opcode test\reference, maybe you consider next
steps.
make cmd of bat with single line
\hiew.exe \testopcode.exe
Then make in some hidden place of you start button menu
link to the bat and assing short key for example ctrl+shift+h.
So any time when you have doubts about opcode and want make
so test to check them you could bring hiew with loaded nops app
by one keystrike.
Hiew has three mode to display contents of loaded file:
ASCII\HEX\ASSEMBLY
You can switch between them pressing Enter.
If loaded PE file into Hiew press F8 and then F5 to get to
programm entry point.
So usual combination after Hiew is loaded with PE file:
dbl Enter, F8, F5.
To start entering opcode press F3.
If you want to save what you entered into file - press F9,
if not dbl Esc. If you just what remove changes but yet want
to continue editing - press Esc just once.
Editing in Asm mode you can see both HEX and MNEMONICs just
in time when you typing. In instead of HEX bytes you want to
enter asm mneminics - press tab while in editing mode it
brings you window where you can insert menemonics.
Next subject to discuss will be detailed explonation of
addressing in opcode relaited to bytes and .
Posted on 2003-02-04 17:33:04 by The Svin
the RTA assembler is nice too

-> http://www.anticrack.de/modules.php?op=modload&name=Downloads&file=index&req=getit&lid=3834

using Hiew is cool, but for those who have 2k/xp it isn't =) since the fullscreen dos is pretty small
Posted on 2003-02-05 07:34:38 by wizzra
Introduction to modr/m and sib blocks.

I hope you used training programm and now
remember well all values for registers for
both sets (full and partials), and also remember
how to code 16 bit registers in 32bit code where
default size of operand = 32.
If not - you'd better stop reading this and read previous
posts along with training yourself with those values.
In code format in bits 2:3:3 it becomes more complicated
to encode reg fields not having their values in mind.

Don't warry if reading introduction you would have
questions - every mentioned point of it we will discuss
below with examples and exersizes untill full understanding.

Bytes and are used to specify operands.

1st thing we should learn:
There might be the only byte or byte +
There might NOT be the only byte without byte .
You can logically treat byte as Extention of byte

2nd thing we should learn:
Both bytes and have bits fields format 2:3:3
that means that 2 high bits of byte, 3 middle, and 3 last mean
different things.
2 upper bits in modr/m is called mod, may have 2^2=4 possible
values wich mean one of four following possible things:
11 - there is no memory operands, all operand(s) is (are) registers
00,01,10 - one of operands is memory operand.
(we discuss difference between 00,01,10 modes later)
2 upper bits in byte SIB (Scale Index Base) may have 2^2=4 possible
values wich mean one of 4 possible scales (multipyers) used with
index register (1,2,4,8).
The rest two 3 bits fields in both modr/m and sib usually
are fields for registers used in addressing, but also may mean something
else wich we learn in short time.

3rd thing we should learn:
What kind of operands bytes specify and what
they don't.
They don't specify predefiend operands. (For example there
can not be byte modr/m sib in instructions like chains, or
place for result in MUL\DIV opcodes also predefined)
They don't specify imm operands, from processor point of view
imm operand is not operands but part of opcode.
They specify only not predefined registers and places for values
to calculate memory address of memory operand
along with displacement (if any)
wich need to be count with the calculation of the address.
(displacement imm value (if any) follows immideatly after
part of opcode, size of it specifyed in field mod of
00-no displacement
01-8 bit displacement value
10-32 bit displacement value)
Now let's answer 3 simple questions:
How processor knows:
1. 1 or 2 operands are used?
2. What set of registers: full or partial specifys value in reg fields
of bytes modr/m and sib?
3. If there 2 operands wich one is "destination" and wich one is "source"?

There common answer for all these three answers that all the info
processor takes from {code} block of opcode, not from modr/m and sib.

To observe it in real opcode let us forget for a moment about existence
of memory operands.
If only reg operands are used (in other words - if there is no mem oprenads)
then value of mod field in modr/m will be 11
In bits it looks like
[8bits of code field]:11 *** ***
for example mov reg, reg:
100010dw:11 reg reg

Answer for first quesiton: proccessor knows that there 2 or 1 operands
from upper bits of reg field.
If there is the only one operand then reg_code of it is placed
in a field mem/r of modm/r byte(last 3 bits)
And field code or reg(middle 3 bits) is used for code extantion.

Example:
MUL EBX
1111 011 1: 11 100 011

Look at second byte 11 100 011
11 - mod field - 11 in mode means "registers only"
100 -code or reg field - in this case we have the only 1 operand
and that means here this field is "code extention"
011 - 011 code for EBX register. In case with one operand this
operand is always placed in last field of modr/m byte.

To get used to the info:
Construct opcode changing opcode
1. in field "code or reg", with the rest bit unchanged.
2. in field "mem or reg"

If there are two operands both fields "code or reg" and "mem or reg"
is used for placement of the operands.


Example:
MOV EAX,EBX
1000 1011: 11 000 011
Last byte (modr/m)
11 - only registers used
000 - code for EAX
011 - code for EBX


Answer for second question
2. What set of registers: full or partial specifys value in reg fields
of bytes modr/m and sib?

It also specified in code block, not in memr/m and sib.
For this perpose code block has bit "w".
But we should remember that partial register can be used only
as register and never as pointer.

If bit w = 1 then reg fields specifys full registers
if it is 0 - partial ones.

Example with one register.


Mul reg has general format
1111 011w:11 100 reg

with bit w = 1
1111 0111:11 100 001 = MUL ECX
with bit w = 0
1111 0110:11 100 001 = MUL CL

as you see last byte (byte modr/m) is inentcal in both
cases, difference is in first byte (code block) and
the difference is in value of last bit, bit "w"
if value of w = 1 we have MUL ECX if w=0 - MUL CL.
Play with opcodes "DIV,iDIV,MUL,iMUL reg" changing last
bit of first byte and look how it's changing meaning of
value of operand between interpretation it as full\partial register.


Example with 2 reg operands:
one opcode for MOV reg,reg has the following format:
1000 101w: 11 reg reg
with w=1
1000 1011: 11 000 001 means MOV EAX,ECX
with w=0
1000 1010: 11 000 001 means MOV AL,CL

As you can see byte modr/m again identical, the only
difference is in byte of code in value of w bit.

Play with other instructions that use to reg operands.
For example sub reg,reg ; add reg,reg etc.
Insert such an instruction and then change last bit of the first
byte and see how it reflect in operands switching them from
full to partial regs set.


And the last third question:
3. If there 2 operands wich one is "destination" and wich one is "source"?

It is also defined by "code" block, not by "memr/m" block.
For this is bit "d" wich states for "direction".


Example with mov reg,reg
1000 10dw:11 reg reg
1000 1011:11 000 001 = MOV EAX,ECX
1000 1001:11 000 001 = MOV ECX,EAX

Again you can see that we can have no difference in modr/m
and yet we cannot say without knowing value of bit "d"
wich of to registers is being copied to wich.

It gives a possibility to encode MOV EAX,ECX in two possible
ways:


1000 1011:11 000 001
1000 1001:11 001 000

both this codes do the same.
Now try it for yorself - insert some instruction of format INSTR REG,REG
and try to encode the same instruction than you assembler did.
for example I now inserting in OllyDbg:
add eax,ecx
OllyDbg generates opcode: 03 C1
in binary 0000 0011: 11 000 001
now I'm trying to encode it different way
first I change value in d bit from 1 to 0 and get in first byte
0000 0001
then I exchange bit values in second byte between last 2 3bits fields
that are for registers and get
11 001 000
full opcode now:
0000 0001:11 001 000 in hex 01C8
I insert it in OllyDbg and it shows in menemonics the same add eax,ecx
So both 03 C1 and 01 C8 are the same instruction.
Look again at their opcodes in binary:


ADD EAX,ECX
0000 0011:11 000 001 ;EAX=000 ECX=001
0000 0001:11 001 000
Posted on 2003-02-05 15:51:14 by The Svin
Memory operands specification in bytes modr/m and sib.
A few simple things to keep in mind.
Wether here 1 or 2 opreands only one of it can be memory operand.
Specification of memory operands means point to places that
used to calculate ADDRESS of the operand.

Again format of modr/m byte (main byte that used to
specify operands and wich follow immediatly after code block)
2:3:3
2 upper bits - mode
3 middle bits - reg or code extention
3 low bits - reg or mem
Let's talk of last bit field.
What does it mean "reg or mem"?
What actuall meaning value in it can have?
There may be different interpratations of the last
3 bits of byte modr/m


1. If mod = 11 then value "reg or mem" means register.
2. If mod = 00,01 or 10 value in "reg or mem" field may mean
- register as pointer to memory
- "flag" that memory operand is specifyed by following sib byte
- "flag" that memory operand is specifyed by direct value address.

Examples:
(in all example byte modr/m is second byte of opcode)


-----------------------------------------
MOV EAX,[EBX];
8B03 byte modr/m here is 03; in bynary
00 000 011
00 mod field means "using mem operand without displacement"
000 - code for eax
011 - code for ebx
look at difference between:
MOV EAX,EBX modr/m 11 000 011
MOV EAX,[EBX] modr/m 00 000 011
the only difference is in field "mod"
for MOV EAX,EBX it is 11
for MOV EAX,[EBX] it is 00
----------------------------------------
MOV EBX,dword ptr [400000h]
8B 1D 00004000
byte modr/m 1D in bynary
00 011 101;mod 00 codr 011 (ebx) memr 101.
code 101(ebp) in "mem or reg" field with mod = 00
means that no register is used to calculate address of operand
and that the address is in following byte modr/m dword.
Note: 101 does not mean here [ebp]!
------------------------------------------
MOV EBX,[eax*4][ecx][3]
8B 5C 81 03
byte modr/m 5C ; 81 -SIB ; 03 - displacement.
in bynary:
01 011 100
mod 01 - means that displacement used and it has 1 byte size.
codr 011- code for EBX register
memr 100 - if 100 code is used in memr field it means
that byte SIB is present and it follows the byte modrm.
It relaited only to "mem" modes (00,01,10)
and only to field memr.
Note: 100 does not mean here [esp]!
------------------------------------------

Now we discuss all of it in details.
For a start we take construction
INSTR reg,
or INSTR ,reg; what is important for us here
that instruction uses two operands and one of them is pointer to
memory.
Let say that register used as reg might be any general purpose
register, full or partial, and register used as pointer
might be any full general purpose register except for esp and ebp.

With given conditions opcode for


instr reg1,reg2
and
instr reg1,[reg2]

is identical exept of value of bit field mod.
In case of instr reg1,reg2 mod =11
In case of instr reg1, mod =00
Example:


mov eax,ebx 8BC3 1000 1011:[b]11[/b] 000 011
mov eax,[ebx] 8B03 1000 1011:[b]00[/b] 000 011


At last we can see some sence in bit "d" of code field.
Indeed, register as memory pointer can be only in bitfield memr
but along with
instr reg,
we need also
instr ,reg
In both of this cases byte modr/m is the same,
the difference to specify wich one of the two operands
is source and wich is destination can be seen in value
of bit "d" in code block
Example:


mov eax,[ebx] 8B03 1000 10[b]1[/b]1:00 000 011
mov [ebx],eax 8903 1000 10[b]0[/b]1:00 000 011


Now again about size of operand.
What the difference between


dword ptr [ebx]
word ptr [ebx]
byte ptr [ebx]


We must remember 2 simple things
-that memory operand specifyed in opcode is not the operand itself
but address of the operand, to be more accurate - values in registers and
displacement immediate value (if any) that is used to calculate
address of the operand.
-address of operand is always address of the lowest byte of operand.
In other words is byte ptr addrx, word ptr addrx, operand of any size
ptr addrx have the same address.
In 32 bit addressing mode address offset value is always 32bit value
(if prefix 67h is not present) and registers that used to calculate
the address are always taken as 32 bit registers.
To specify size of operand used known to us bit "w" in code block
and prefix 66h.
Example:


(bit D = 1;from mem to reg)
mov eax,dword ptr [ebx] 8B 03 1000 101[b]1[/b] 00 000 011
mov ax,word ptr [ebx] 66 8B 03 (all the same but prefix 66 present)
mov al,byte ptr [ebx] 8A 03 1000 101[b]0[/b] 00 000 011


Now encode all this 3 opcodes with bit D = 0 and look at the results.
(bit "d" is bit[1] in byte code in present examples byte code
has values 8B, in first two opcode and 8A in the last one)
Posted on 2003-02-06 18:34:34 by The Svin
Advantage to know opcode format is mostly
usefull to get in a fly impression what size
of opcode could be produced from your menemoncis.
Most opcodes size can be calulated as size of
code block (wich in overvelming majority of
most most used instructions = 1 byte) plus
size for operand specifications.
With detailed examples of this articles upto
now you can assume that in most constructions like
instr reg,reg
instr = 1 byte, and address part is also 1 byte.
Altogether 2 bytes.
If reg is 16 bits add also 1 byte for prefix.

in formats like:
instr reg,
instr ,reg
address part is also 1 byte if doesn't mean
or . So if you can use ebp and esp as registers
without any size penalties but using them as pointers
without displasment costs you extra byte.
For the rest registers all calculations are the same,
wither you use them as registers or as pointers.
Using word operands also cost you extra byte for prefix.
Posted on 2003-02-06 19:35:15 by The Svin
Displacemt, or what mod = 00,01,10 is about.
We know that is we have two operands as registers
all we need to specify them is the only byte modr/m
wich have 11 is field "mod" and code for two registers
in the rest two field.
For example byte 11 000 001 specifys two register operands
eax and ecx, 11 011 111 specifys two register operands ebx and edi
and so on...
We also know that if we have to specify to operands: one of them
as register and other as register-pointer(any one but esp or ebp)
the only modrm byte will also be enough.
In this case we mod = 00, field "code or reg" contents code of
register operand as register, and field "mem or reg" contents
code for register as pointer.
For example byte 00 000 001 specifys eax as register and as
pointer;00 011 111 - ebx as register and as pointer.

Let's say at last why there could not be of specifyed
by the only one address byte - modrm.
Code for placed in field "mem or reg" with mod = 00 has
special "flag" meaning - it means that there is no registers
used as pointer and addressing is "directed" by following dword.
Code for with any of 00,01,10 mods has special meaning too -
it tells that byte SIB is present and addressing registers is in SIB
not in field "mem or reg".

Of course, you can write mnemonics mov reg,; or mov reg,
but format to encode the meaning will be different then with any
other register used in place for register pointer ("mem or reg" field).
We discuss how to encode mov reg, or mov reg, later in short
time.

For now we bring to our attention meaning of mod = 00,01,10.
We come here to very intersted point for optimization:
in many instructions immideate dword operand can be encoded using
single byte.
We talking here of signed byte and signed dword respectivly.
If value is negative and >= -128, signed dword has the same value
in least significant byte as in byte of the same value
and all upper bits set to 1.

Example:


-2=
FFFFFFFEh as dword
FEh as byte.

If value is positive and <=127 signed dword has the same value in
least significan byte as in byte of the same value and all upper
bits set to 0.
Example


2=
00000002h as dword
02h as byte

This allow processor perform "signed byte to signed dword" extention,
in other words - use a signed byte as dword operand "extenting"
value of most significant bit of the byte to all upper bits of
dword.
"Byte to dword extention" can creatically decrease size using
to encode dword imm. value 4 times less space then otherwize.

To understand how to encode in code that there is "extended to dword byte"
we need to diffrentiate to types of imm. values.


1. Imm. value used as operand:
and eax,03 ; 03 is imm. value as operand.
2. Imm. value as displacement:
and eax,[ecx][3] ;3 is imm value as displacement that needs
;to be counted in calculation of address

These two types are encoded in different ways.
Firstly we'll learn how "displacement" is encoded.
Now by term "membytes" I mean part of opcode that has byte modm/r and
sib(if sib present).
"If there is displacment" is specifyed by value in mod field.
mod = 00: there is no displacemt (exeption code for ebp in mem/r field
that means that there is "only 32 bit displacement and no register pointers")
mod = 01: there is displacement and it is coded in single byte.
mod = 10: there is displacement and it is coded in dword.

Examples:


mov eax,[ebx]
8B03.
03 is byte modrm = 00 000 011 ;00(mod) - no displacement
;000 - eax
;011 - [ebx]
mov eax,[ebx][-2]
8B43 FE
43 is byte modrm = 01 000 011 ;01(mod) - displacement in one single byte
;000 - eax
;011 - [ebx]
FE - is the displacent in "extended byte" processor extends it
to FFFFFFFEh

mov eax,[ebx][410000h]
8B 83 00 00 41 00
83 is byte moder/m = 10 000 011 ;10(mod) - displacement in dword
;000 - eax
;011 - [ebx]

And last four bytes is dword 00410000h - imm. value of displacement
coded in all 32 bits.

Now we can say how assembler encode as pointer.
Instead of using mod 00 is uses mod 01 (displacement as byte)
and instr reg, is encoded actually as instr reg,[0]


You can insert two instructions like:
mov eax,[ebp]
mov eax,[ebp][0]
and any like that, to see that it is the same opcode.

Take a notion for that using:
instr reg,[ebp]
is 1 byte longer than:
instr reg,[any reg but ebp or esp]

At the same time:
instr reg,[ebp][displacement <> 0]
has the same size as:
instr reg,[any reg][displacement <> 0]

Though in the example reg is the first operand and pointer
to memory the second, size and encoding will be the same
if you exchange their places. The difference will be only
in value of bit "d" of code block, not in size and values in membytes.
Posted on 2003-02-08 08:17:59 by The Svin
SIB byte.
Here we'll discuss:
- What tells processor that there is sib byte?
- Format of sib byte
- ESP can not be index register. What happens if there is esp code
in place for index.
- Again about EBP - we can't get short opcode
for without displacement.
- How instr reg, is encoded.
- Address formula - line equation.
-----------------------------------------------------------
- What tells processor that there is sib byte?
Remember that sib byte might be only if there is modr/m byte
preceeding it.
In other words there might be modr/m without sib, but
there can not be sib without modr/m.
Presence of sib byte needed only if there is a need to calculate
memory address.
So processor knows about presence of sib by two thing:
* mod field says that there is mem operand (mods 00,01,10)
* mem or reg field has special "flag" value 100 (code for esp)
so modr/m bytes with format:
00 *** 100
01 *** 100
10 *** 100
all tell that there is sib byte wich follows the modr/m byte.
------------------------------------------------------------
- Format of sib byte:
SS:III:BBB


In general foramt of sib byte is:
SS: two upper bits content code for scale
(multiplyer) of index register
00 = 1
01 = 2
10 = 4
11 = 8
for example sib
for [reg][reg] is 00 *** ***
[reg*2][reg] 01 *** ***
[reg*4][reg] 10 *** ***
[reg*8][reg] 11 *** ***
III:3bits code for index register -
any general purpose register but esp
for example
[eax][reg] 00 000 ***
[ecx*8][reg] 11 001 ***
[edx][reg] 00 010 ***
[ebx*2][reg] 01 011 *** (01 = scale 2; 011 - ebx; *** code for base reg)
BBB:3bits code for base register.
[ecx*4][eax] 10 001 000 (10 = scale 4; 001 -ecx as index;000 eax as base)



I'm curious if anybody read this :).
Make some some noise, please :)
Posted on 2003-02-08 18:15:11 by The Svin

I'm curious if anybody read this :).
Make some some noise, please :)


I'm reading it, and learing it (at least I think I'm learning it :)).
Posted on 2003-02-09 05:59:21 by scientica
Very good.
I have a good company then :)
Posted on 2003-02-09 10:01:44 by The Svin
- ESP can not be index register. What happens if there is esp code
in place for index.

If in index field placed code for esp (100) then index is ignored,
whatever is in scale field in the sib - doesn't matter:
Only base field is taken in count to calculate address and
the result of address calculations will be the same as if
the were not byte sib at all and value of base field were placed
to "mem or reg" field of byte modr/m
in other words this two addresses is idenical:


modr/m sib
** *** 101 ** 100 reg
** *** reg (no sib byte)
example with real opcode:
[eax] might be coded two ways with and without sib:
modr/m
00 reg 000
modr/m sib
00 reg 100 ** 100 000
whatever you'd place in scale field - result will be the
same: addressing reg,[eax]

If you have question what for it could be needed, - the answer
is to make it able to code addressing like: reg,
So it is answer to next our question:
- How instr reg, is encoded
It also amswers how is coded.


modm/r byte sib
mod rcode memr ss iii bbb
00 reg 100 any 100 100
if mod 01 or 10 with the above format that would mean
mod 01 reg,[esp][displacement byte]
mod 10 reg,[esp][displacement dword]

(displacement imm. value follow immideatly after sib
in the case).
So the only reason to use SIB while not using index register
is to encode as base pointer.

What if we have reversed situation?
If have the only index with scale and no base register?
Answer to the question also explain one of our questions:
- Again about EBP - we can't get short opcode
for without displacement.

Placement code for ebp(101) in "base" field with mod = 00 has
the same effect as placement ebp code in "mem or reg" field
of modr/m byte with (again!) mod=00.
It means 2 things:
-that there is no "Base" register.
-there is imm. 32 bit displacement used as base.

It result in paradoxial result:

you can encode with just one sib
but encoding will result in sib + 4 bytes
for displacement!


Example:
mnemonic: mov eax,[ebx*4][ecx]
opcode: 8B0499
where 8B is "code";04 modr/m;99 sib
mnemonic: mov eax,[ebx*4]
opcode: 8B049D 00 00 00 00
where 8B is "code";04 modr/m;94 sib
add 4 byte 00 00 00 00 - displacement!
Posted on 2003-02-09 12:44:59 by The Svin