I am trying to make a disassembler program.
I can analyze PE file by code but in disassembler part, i have some problems. I search internet and could find some library and codes about disassembly ( http://pvdasm.reverse-engineering.net/ , http://www.geocities.com/~sangcho/disasm.html ) but it is hard to understand/follow codes. I need some help.

1-) We must read opcodes in Code Segment to disassembly a PE file. In Code segment how are codes arranged? I mean, every instruction is one byte long and after instruction, operands come (I hope i am true). So Must i read it byte by byte? I mean
-one byte instruction
-one byte operand
-one byte instruction
-one byte operand etc... Is that true?
I want to be sure how must a disasembler read Code Segment, what is its algorithm, theoretically. How is codes segment arranged, just same as assembly language codes?

2-) Which manuel must i use for such project? 80386 or 8086? I asked this because i wonder if there is a instruction for Pentium 4 and if i don't include it to source codes, what must program do for such a situation?

I hope i can explain my problems
I am looking for your answers or any advices to help me.

Posted on 2007-09-07 23:04:35 by sawer
Not all instructions are 1 Byte, they can be upto 3 Bytes. The operand size also depends upon other things, it isn't always 1 Byte.

You can get the Intel manuals from here - http://www.intel.com/products/processor/manuals/index.htm
Volume 2A and 2B will be useful.
Posted on 2007-09-08 01:03:23 by lone_samurai5
Sawer, Mnemonics Opcodes can be up to 3. Instructions can be up to 15.

There exist two ways for organizing a Disassembler Engine: Table-Driven and Code-Driven. In both cases, the logical organization looks like a 256 trees set, starting from the first encounted Byte, and reflecting the x86 mnemonics manuals, that you will find at the above address.

Before starting such a work, take a serious look at the existing Open Sources Disassemblers. PvDasm and RosAsm are good examples, showing each methods (Table-Driven vs Code-Driven).

Note: Keep in mind that, inside a Disassembler, the Disassembler Engine is just... nothing, even if it represents some quantity of work. The main job of a Disassembler, is with making the difference in between what is Code and what is Data. Data and Code are most often found inside a Code Section. For example, almost all C Sources include Pointers Lists, inside Code. If an Engine can be written in a couple of weeks, be aware, before starting it, that implementing the Code vs Data analyses is a never ended job. Expect, at least, 2 or 3 years of works for something acceptable.


< http://rosasm.org >
Posted on 2007-09-09 04:09:41 by Betov
Ask yourself what you want to do: an instruction disassembly engine, or a disassembler - or both. Even the instruction disassembly engine by itself is some work, as there's a lot of tables to build, etc. There's plenty of disassembly engines already available, one of the more frequently updated ones being diStorm.

As betov stated, that's just a minor part of a complete disassembler... input format handling, analyzing heuristics (data vs. code differentiation, possibly handling anti-disassembly tricks, etc.) and even more - it's not an easy job, and you won't be able to do something that can automatically handle everything.

There used to be a "How to write a disassembler" article on www.spiralspace.com, but it was taken down a while ago - dunno if there's any mirrors.
Posted on 2007-09-09 11:10:51 by f0dder
There used to be a "How to write a disassembler" article on www.spiralspace.com, but it was taken down a while ago - dunno if there's any mirrors.
you don't know this magic trick? :)
Posted on 2007-09-09 12:19:15 by drizz
I do know the "magic trick", but it doesn't always work, and it seems like my local copy was updated early 2005, while the last version archive.org has is 2001... perhaps somebody should try to mail the guy and ask where the disasm stuff went?
Posted on 2007-09-09 13:46:59 by f0dder
Ok. Understood.

Thanks for all answers.
Posted on 2007-09-09 14:08:34 by sawer