In regards to XASM, I've just written a new parser which can detect various text encodings, and can even detect and handle switched encodings midstream. So far, UTF8, Unicode and 8 bit Ascii are supported, probably enough I think.
That project is not dead, and this would be its new front end should I decide to jumpstart it.
If I do, I'll be looking for a third party open-sourced assembler core such as NASM's because I'm incredibly lazy (why reinvent the wheel? lets concentrate on the reasons why we're doing it in the first place).
My revised version of XASM would concentrate on the front and back ends, and eliminate the need for an internal bytecode / intermediate representation.

Posted on 2010-01-22 01:06:13 by Homer
In regards to XASM, I've just written a new parser which can detect various text encodings, and can even detect and handle switched encodings midstream. So far, UTF8, Unicode and 8 bit Ascii are supported, probably enough I think.
What do you mean by "unicode"? :) - UTF-16 or UCS-2? Personally I don't see much use for non-ascii on source code files, but then I am a pretty firm believer that string resources should be externalized and source code kept in English :)

My revised version of XASM would concentrate on the front and back ends, and eliminate the need for an internal bytecode / intermediate representation.
Hm, so you'd basically want to emit x86 opcodes as soon as you see an instruction?
Posted on 2010-01-22 07:13:51 by f0dder
Yes, and no.
I would emit logical blocks of opcodes, associated with symbols where appropriate, and geared for OMF or ELF.
The front end is the parser, macro expander, symbol generator, etc.
The middle is just the assembler proper.
The back end is the object emitter.
The middle and back end can be pluggable.
And the front can be naiive, excluding macro expansion.
This will allow me to generate code for machine platform x, without needing to virtualize everything to quite the extent I had previously proposed (it started looking like jvm and that made me feel bad).

Posted on 2010-01-22 08:12:24 by Homer
So, basically emitting x86 opcodes right away (after front end is done), along with list of symbols/fixups.
Sounds reasonable enough, as far as I can tell it should work... you'd probably lose the ability to do short/near jump optimization, though? Apart from that, I can't think of any immediate reasons to go with an intermediate representation, since we're talking WYCIWYG Assembly and not a HLL.

This will allow me to generate code for machine platform x, without needing to virtualize everything to quite the extent I had previously proposed (it started looking like jvm and that made me feel bad).
JVM isn't such a bad idea imho, but over-virtualization (trying to make a "generic assembly" syntax that can target multiple architectures) isn't a good idea... you might as well be coding in C, then :)
Posted on 2010-01-22 10:30:46 by f0dder

That project is not dead, and this would be its new front end should I decide to jumpstart it.


That's good to hear, I hate when good ideas fade away due to the constraints of time/reality.


If I do, I'll be looking for a third party open-sourced assembler core such as NASM's because I'm incredibly lazy (why reinvent the wheel? lets concentrate on the reasons why we're doing it in the first place).


NASM is (recently) BSD licensed, so that would be fairly easy to do and with no strings attached :)

However, YASM (also BSD licensed) is doing something similar to XASM via modular design (libyasm) and perhaps this would be a better choice of code base to utilize?


Yes, and no.
I would emit logical blocks of opcodes, associated with symbols where appropriate, and geared for OMF or ELF.
The front end is the parser, macro expander, symbol generator, etc.
The middle is just the assembler proper.
The back end is the object emitter.
The middle and back end can be pluggable.
And the front can be naiive, excluding macro expansion.
This will allow me to generate code for machine platform x, without needing to virtualize everything to quite the extent I had previously proposed (it started looking like jvm and that made me feel bad).


Perhaps borrowing from (or contributing to) LLVM's code base (also BSD licensed) would be more appropriate considering XASM's design?
Posted on 2010-01-22 10:34:53 by SpooK
That does look interesting :)
Posted on 2010-01-23 06:07:27 by Homer
Perhaps borrowing from (or contributing to) LLVM's code base (also BSD licensed) would be more appropriate considering XASM's design?
Still chasing the "universal assembly" dream? :P
Posted on 2010-01-24 09:07:47 by f0dder

Perhaps borrowing from (or contributing to) LLVM's code base (also BSD licensed) would be more appropriate considering XASM's design?
Still chasing the "universal assembly" dream? :P


Perhaps more the "acceptably higher than assembly" language :P
Posted on 2010-01-24 10:43:46 by SpooK
Why, though? We already have C? :)

I'm expecting such a thing to pretty much end with a lot the disadvantages of both languages, and lose some of the advantages the individual apps have...

e.g., you lose the ability to use CPU-specific instructions (otherwise there's not much point in trying to make it generic), but you don't get the very wide portability of C. And you'll probably be working on a low abstraction level, which means the same development time as assembly, and less semantic value for the LLVM optimizer to go from. You'll also be losing "what you code is what you get" which is arguably one of the biggest benefits of assembly.

But perhaps I'm misunderstanding the scope, and/or missing the visions entirely? :)
Posted on 2010-01-24 11:17:37 by f0dder

Why, though? We already have C? :)


If we are talking about strictly about C99, then sure :)

As for "universal assembly", I think it was more along the lines of a clean separation of the front-end and back-end, and not so much an attempt at inventing yet another virtual machine.
Posted on 2010-01-24 15:48:08 by SpooK
Not normally being the one to drudge up old topics...  :P

I've played around with LLVM and like it's purpose - a "portable assembly language" if you will. You can compile directly to native or use their JIT for bytecode interpretation.  I can see application programs written using this to gain portability to other OS's and systems. It's quite intriguing.

However, as f0dder noted previously, optimization is a major obstacle. I doubt the binaries created will ever match hand-optimized coding on a specific CPU architecture. I think Apple's involvement with the project keeps the dream alive (most of the key developers work for Apple). And their Clang project to replace gcc is also interesting.

A lot of conspiracy theories can be formulated on this approach. With BSD looking to use Clang to compile the kernel it would be quite amazing to see LLVM get to the point where you will truly be able to run BSD on all hardware platforms that have an LLVM compiler implemented. Once they figure out the optimization part, of course ;)
Posted on 2010-07-24 15:22:25 by p1ranha
Look no further than the ARM architecture, which is the second most common. Read/write unaligned = GPF. First 4 func-args are not on stack. There are instructions to push/pop up to 16 registers at once, not to mention the 4 different ways and directions of stack management. Then, the conditional execution. Etc etc
You really have to describe your code at a higher level - like C, for it to be optimizable on the two architectures. 

Hmm, deja vu
Posted on 2010-07-26 00:11:26 by Ultrano