It is a try to write a book for those who are always
running for knowlege but never can reach it.
For people like me. Born dummies.

Main style of this material - make theory from
examples, not examples from theory.
In other words: it's full of examples and exersizes,
and the only way to understand theory inside to
do all exersizes, and to try all examples.

It written to absolute beginners for "opcode" issue,
though not beginners to assembler programming.

What "opcode" is?
-----------------
It's a main question of all our lessons.
We answer it now in short:
Example:
when you write in your source code:
lodsb
in compiling stage your assembler (for example ml.exe)
when meets "lodsb"
will place inside your exe(or first in .obj) file byte of value ACh.
This ACh is "opcode".
lodsb is "mnemonic".
mnemonic "lodsb" is saying to assembler ml.exe:
"I want you to place byte ACh here in this exe"
Processor has no ideas what characters "lodsb" mean.
But when its register EIP(that points to command bytes
of your loaded exe file)
has address of this byte consisting value ACh.
it loads the byte inside processor decoder,
see that value of it = ACh,
and by this value it knows that the programm wants him to
load byte pointed by register esi into register al.

That simple.
So may be we can say:
There are mnemonics and opcodes,
and each mnemonic is ALIAS for some opcode?
Hey, slow down here :)
Not just as easy. In this particular case with mnemonic
"lodsb"
we indeed can say:
Yes, opcode for mnemonic "lodsb" is ACh
and opcode ACh indentifies mnemonic "lodsb".
Not always. We'll see examples of "why not" very soon.
For now I'll just say that primary (real) thing is
opcode and mnemonics are names for this thing(s).
And as in real life there are not always unic name
for unic object, in realation with opcodes and mnemonics
there are the same:
some real things (different opcodes) may has the same name(mnemonic),
and some one opcode value may have several names (mnemonics).
+ something that if your a begginer can scare you
though it's very simple stuff.
so I'll tell you later but soon.
Before we come to examples we need to set an easy way to
lookup either:
1.what mnemonic is for some opcode value
2.what opcode value may be generated from some mnemonic.

SETTINGS FOR REFERENCE,TEST AND EXERSIZES
------------------------------------------
There are many ways to lookup opcode\mnemonics.
I recommend the fastest and simplest - use debugger.
All examples I'll do in OllyDbg.exe.
If you don't have it - download it, it's free and worthy to have.
All it is about typing commands - don't run them!
Don't press F7,F8,F9!!! After finish with typing
exersizes - use ALT-X to exit debugger without running programm.

Exersize1. Insert mnemonics and opcodes right in debugger.
Typing mnemonics:
1 Run OllyDbg.exe
2 Open in it ANY win32 prog (exe).
3 Start typing: lodsb
You'll immidiatly see dialog box with what you are typing.
4 After you typed "lodsb" - press enter
You will see in programm code window line above grey cursor
something like:
0040108C > AC LODS
First column (0040108C) is address of command bytes in memory
(the command will be executed when this address will
be in EIP register)
second column (AC) is OPCODE of command
third column (LODS ) is MNEMONIC for the OPCODE.

Come on - get used to it, type some other mnemonics and
watch their opcodes.

Typing opcodes:
If you are not in OllyDbg.exe with some prog loaded in it.
Do step 1,2 of previous exersize.
3. Press ctrl-e
You'll see dialog box with bytes and those bytes are real opcode
of code
4. Type already known to you opcode:
AC
5. Press enter
You'll see in the place where cursor was 2,3 columns
will be identical to what we see when typed mnemonic lodsb:
AC LODS

Get used to typing hex opcodes in debugger.
If you not familiar with opcodes just look at opcodes in loaded
code and type them in different lines to see that OllyDbg recognize
them.
Don't run the mess you typed in :) Not now.
Exit OllyDlg using ALT-x without running debuggy.

EXERSIZE 2. Are there always 1 mnemonic for 1 opcode?

1. Type OPCODE(pressing ctrl-e):
90
Press Enter.
Look: it is recognized as NOP mnemonic
0040108E 90 NOP
2. Type MNEMONIC (just type) NOP.
you see the same 90 NOP
3. Type MNEMONIC xchg eax,eax
What a heck!!!
OllyDbg doesn't insert our mnemonic!
It placed NOP mnemonic instead!!!
Yes, it's a question to author - he always show NOP if see
90 opcode, never xchg eax,eax.
But what is imporatnt to us is that 2 mnemonics:
"xchg eax, eax" (or xchg ax,ax in real 16 bit mode)
and "nop"
both suggest opcode 90h.
So we can see how one real thing (opcode), may have different names(mnemonics).
I repeat it again:
We should always remember the only real thing in computer life is
opcode. Mnemonics is just names for those things and this mnemonics
system is language, and it's not perfect 'cause there are no perfect languages.
The real life is always a little bit different then its description in
any language.
Before any conclusion let's do second part of exersize.
1. Type MNEMONIC add eax,1. Press Enter.
You'll see:
0040108E 83C0 01 ADD EAX,1
2. Press ctrl-e (to bring OPCODE window)
and type in it 5 bytes:
05 01 00 00 00
Surprize, surprize!
we can see
0040108E 05 01000000 ADD EAX,1

The same mnemonic but different opcodes!
They are not only different in size.
They have different structures.
The First one (83C0 01) is 3 byte long, and "three blocks" opcode.
The Second one (05 01000000) is 5 bytes long, but is "two blocks" opcode.
Before we start learning what those blocks are, lets make some notions.

1. As you probably guessed looking at two opcodes
83C0 01
05 01000000
01 in first opcode and 01000000 in the second are immediate value
added to eax.
Though OllyDbg groups values in a type of some operand, it doesn't change
bytes order of this value. Bytes shown from left to right in order of
increasing their addresses.
In other words Example(type it in):
in second opcode version mnemonic
add eax,01020304h
will look like
05 04030201 (type 05 several times with different following 4 bytes
trying to spell different dwords in bytes order)
the dword grouped in opcodes (bytes of 01020304h don't separated by spaces)
the bytes of the dword placement in order to left to right is spelt
as in order of increasing their addresses. And it's right thing to do
it that way.

Frankly, IMHO, one of greatest mistakes and confusions in all books
about assembly are statements like "bytes in dword are in reversed order"
Reversed to what axis?
Memory doesn't have right and left coordinates.
It has only addresses - interger values.
And dword bytes are not placed from left to right of right to left :)
Treat 4 bytes in dword as four digits in radix 256 numeric system.
Then the higher is digit of the dword the higher address it has
for example dword ABCD1234h
you may reprisent as:
256^3*ABh+ 256^2*CDh + 256^1*12h + 256^0*34h
Let's now place those four members of sum that represents value ABCD1234h
in order of growing their power of 256 part.

256^0*34h - first digit of dword. Byte index=0 power = 0 rva address=0
256^1*12h - second digit of dword. Byte index=powerof256=rvaaddress=1
256^2*CDh - third digit of dword.Byte index=powerof256=rvaaddress=2
256^3*ABh - fourth digit of dword.Byte index=powerof256=rvaaddress=3
rva address here is address of byte minus address of dword in memory
and you see that address of dword is address
of least singnificant value digit, and it is first byte.Or byte[0].
and address of any digit of dword minus address of the dword is both
index of byte and power of 256 and rva relative to address of dword.
No left or right stuff. Numbers, powers, their relation etc. only.
The other thing is how numbers SPELT.
We spell most singificant digits on left and least on right.
So spelling from left to right we spell digits in decreasing segnificance
order. If we would type them from right to left we would do it in increasing
order. But what direction of our writing or reading has to do with
addressing system in computers? Nothing.
If it is too complex to your, we'll discuss it in other tutorial
"Position numeric system in depth for asm programmer" :)
Where you can get it? You can't. It's not written yet.
But if you think you know it - think again.
First time discussion of positioning numeric system was discribed
by Leonardo from Pizza(Fibonachi) in his "book of Abacus" in 13th cent.
The whole book (456 pages) is all about positioning numeric system.
And yet Eroupe needed ~ 200 years to get to used to it after book was written.
Unfortunatly our ancesters were too stupid to count that arabs read and
spell from right to left.
So if you think you read couple pages of R.Hide book and you
know all about it - think again.
As well as think again before calling arabs or indians uncivilized or barbaric:
they invent it, and use and understand it many centures before us.
Now we used it in our siences to make our superweapons to bomb them,
calling them not civilized enough for us. Strange way to show gratitude :)

OK, to opcode now :)

So if there are optional ways to convert the same mnemonic to opcode,
what opcode will be actually inserted in exe instead of mnemonics?
And who decide on it?
The answer is: it does your assember program (ml for masm).
It decides it for you, and some low lever coder may not agry that
assemblers do it always most optimal way.
Assembler always has author, the author may have he's opinion what
is optimal and an other coders opinion may differ.
We are very lucky here to have real assembler authors (as Privalov,
bitov, and I recon that Maveric was saying of his compiler\optimizer)
So when you have questions regarding how some assembler treat mnemonics
you are lucky to have oportunety to ask it directly to assembler writers,
our freinds here.

2. You know how to place opcode values directly in debugger,
what if you want to code in hex some part directly in source?
I think you know answer even if you never did it.
When you need to place some particular values in data section (for example
some bytes)
you probably write
somevar db 0Ah,0Dh
and you'll have bytes 0Ah,0Dh place inside your app.
do the same in code section, you don't need type name for those
bytes just declare their values:
db ACh
you may code all in hex or just part of it,
for example instead of
mov esi,offset somedata
lodsb
you may write:
mov esi,offset somedata
db ACh

OK It's time again I'm cutted off inet, so I must send what I wrote already
or I'll not be able to send anything.
Next time we start with structure of opcode.
Building blocks of wich opcode is created.
Introduction to it only, and few important general rules
that you'll find simple and usefull to know.
Posted on 2002-11-15 23:31:42 by The Svin
Nice tutorial!
And ollydbg helps alot.

"So spelling from left to right we spell digits in decreasing segnificance
order. If we would type them from right to left we would do it in decreasing
order."

Shouldn't it be "If we would type them from right to left we would do it in increasing
order."

for example: (twentyfive)
..
.5
25

Either it's just a typo or I didn't understand it :)
Maybe you just mean that either way we would start typing with the most significant digit.

I just point this out because it could be confusing for beginners.
Posted on 2002-11-16 01:59:39 by nyook
Right you are!
I fixed the typo. Thanx!
Posted on 2002-11-16 03:11:44 by The Svin
In my humble opinion, this 'sample' of your's look's pretty good. Thats a book
I would defintly consider reading. Since it could quite possibly still some of
my everlasting hunger for knowledge.

NOTE: Maybe I would have worked a little bit on the formatting and typo's tho. :tongue:

EDIT: The Svin: I took the liberty to rewrite some of your writing a little bit. No offence?
Just a little something something on how you could have written some of it.
Again, I dont mean to offend you in anyway(I am norwegian so english is not my mother tongue either).

Here goes nothing:
__________________________________________________
\ The Svin's tutorial on Opcode's part #1
/?????????????????????????????????????????????????
This is my legacy to others, to write a book for those who are always seeking
for more knowledge. But get's lost in all the mumbo jumbo. People like myself
who where born a crash-test-dummy.

Main style of this material is to make theory from examples, not examples
from theory. In other words: It's full of examples and exersizes, and the
only way to understand the theory inside, is to do all exersizes, and to
try all examples.

This was written for the absolute beginners concerning the "opcode" issue,
though not beginners to assembler programming.

__________________________________________________
\ What "opcode" really is?
/?????????????????????????????????????????????????
This will be the main question resolved over a number of lesson's. We will
answer it now in short terms.

Example:
When you write this in your source code: "lodsb"
In the compiling stage of your assembler(f.e 'ml.exe'), as it read's "lodsb",
it will place a file byte value of ACh, inside your exe(or first in .obj).

The value 'ACh' is our "opcode", and "lodsb" is our "mnemonic".

The mnemonic "lodsb" is telling the assembler 'ml.exe': "I want you
to place byte 'ACh' here in this .exe". A Processor has no idea of what the
word "lodsb" mean.

But when it's register EIP(that points to command bytes of your loaded .exe file)
has the address of this byte consisting of value ACh. It loads the byte inside the
processor's decoder, and understands that the value of it = ACh, and by this value,
it knows that the program wants him to 'load byte pointed by register esi',
into register al. It's actually that simple.

So maybe we can say: There are mnemonics and opcodes, and each mnemonic
is ALIAS for some opcode? Hey, slow down here! It's not quite that simple. In this
particular case with mnemonic "lodsb" we indeed can say: Yes, opcode for mnemonic
"lodsb" is ACh and opcode ACh identifies mnemonic "lodsb". However that
is not always the case.

We'll see examples of "why not" soon enough. For now I'll just say that the
primary subject(real-thing) is that opcode's and mnemonic's are names for these
thing's.

And just as in real life there are not always unique name's for unique object's,
in relation to opcode's and mnemonic's, they are the same: some real things
(different opcodes) may have the same name(mnemonic), and some opcode
value may have several names (mnemonics).
Posted on 2002-11-16 03:52:13 by natas
natas,
You are most wellcome for that kind of job you did the above.
It's really helpfull, and of course, no offense is taken.
In opposite, I'm very gratefull.
In past I asked Steve to correct my English :)
Posted on 2002-11-16 07:05:45 by The Svin
The Svin,
Good to hear that you didnt get offended. I was a little bit worried there for a sec. :grin:
Anyway, maybe I should overlook the whole thing and also the other lessons?

I dont mind rewriting parts of it to clearer syntax. Since what you have to tell me is
things I didnt know about. And when im rewriting this and that, I need to really understand
what your trying to say. Therefore, I learn even more from doing it then I normally would have
by just reading it. :)

But just say the word and ill start analyzing and converting your words a little bit. :alright:
Posted on 2002-11-16 07:17:58 by natas
Good read, Svin. :alright:

Natas, you did a good job proofreading. Your English is very good. :) Here are some suggestions for you. I hope they can help you polish your English a bit:



"But get's lost in all the mumbo jumbo."

This is not a sentence. You need an object, like 'they' (which refers to those 'who seek more knowledge' from the previous sentence.) Also, there's no need for the apostrophe in 'gets'. Suggestion: "...for more knowledge, but they get lost in all the mumbo jumbo."



"People like myself who where born a crash-test-dummy."

This isn't a sentence because of the 'who' word in there. 'where' should be 'were'. You might also want to change "crash-test-dummy." This typically means somebody that gets beat up a lot. I don't think Svin was trying to say that. ;) Suggestion: "People like myself were born crash test dummies." or a way that makes more sense in the context of the paragraph: "This is for people such as myself that were born a dummy."



"It's full of examples and exersizes, and the only way to understand the theory inside, is to do..."

'exersizes' should be 'exercises'. No comma needed between 'inside' and 'is'.



Be careful with your apostrophes. Never use it to pluralize a word. Apostrophes often connote possessiveness when they aren't used in conjunctions. "Bob's code" means the code belongs to Bob. "Bob's code" can also be a conjunction and means 'Bob is code'. "Bob's code's" doesn't mean anything, it should be "Bob's codes" You did this with "lesson's" "thing's", "opcode's"(2 times) and "mnemonic's."

English isn't as easy as people think it is. ;)
Posted on 2002-11-16 09:29:45 by iblis
The Svin, Great work!!! :)
(I've read #1 and #2, and I'm going to spend the most of the night playing with OllyDbg...)
Posted on 2002-11-16 11:33:06 by scientica
btw: thank you for the explanation of the reversed byte thing. I thought always that one has to be a dumb nut to write a number in reverse order :D
Posted on 2002-11-16 12:43:33 by nyook
iblis,
What? you're correcting me? :tongue: Thanks for correcting some of
my writing. I think you would have been a more qualified candidate
to do any proofreading. ;)

Apostrophes will always get abused.( they need there own support group :grin: )
Posted on 2002-11-16 17:19:11 by natas