Hello everyone, I'm trying to write an assembler for using under different platform, so I wonder if anyone know where I can get some info on how to start (how to process the symbols, etc)? The nasm source code seems a bit confusing to me.

Thanks in advanced.
Posted on 2002-05-11 08:07:25 by [KSC]

-Betov's SpASM source code (this is a tricky one as source code is inside main exe) found here: http://betov.free.fr/SpAsm.html

-FASM (Flat Assembler)
found here :http://omega.im.uj.edu.pl/~grysztar/

-TMA (The Macro Assembler)

Posted on 2002-05-11 08:22:21 by BogdanOntanu
Thank you, but I really prefer to have some documents bout the process of a 2 pass assembler, for example.
I need to know how to process the symbol, forward references.

Thanks anyway
Posted on 2002-05-11 08:38:18 by [KSC]
Well that si general compiler/interpreter theory...

A very simple explanation:

when you find a symbol in source code then search (better use hash tables) it in the "symbol table", if found replace it with value, if not push it and the addres where you found it in a unresolved reference stack and add it to the symbol table.

Later when you find its define go back in the unresolved stack pop items and adress and replace as appropiate...
Posted on 2002-05-11 09:43:33 by BogdanOntanu
The classic 2-pass assembler defines label values in the first pass (storing them in a table), then uses those values (from the first pass) to generate finished code in the second pass. That solves the forward reference problem, because no code is generated in the first pass. (You still need to figure out how long each instruction will be during the first pass.) This works well if there is only one encoding for each instruction.

Some processors have long and short forms of instructions. For instance, some processors call short jumps "branches", and long jumps "jumps". And they have different mnemonics for them. This makes it relatively easy to build an assembler that doesn't fill your code with NOPs.

The conventional Intel/MASM/TASM syntax doesn't provide separate mnemonics or syntax for many instructions. The result is that during the first pass, the assembler assumes the worst case (longest instruction form) for forward references during the first pass. Then, in the second pass, the assembler uses a shorter form (if possible), while filling in the unused bytes with NOPs.

The n-pass assemblers for x86s recalculate the defined values if at each pass (after the first) it discovers more instructions that can be reduced to shorter forms.
Posted on 2002-05-11 18:46:07 by tenkey
Thank you very much, I'll try.
Posted on 2002-05-12 08:29:09 by [KSC]