This is my first formal request for expressions of interest in my CROSS ASSEMBLER project.. If enough users are willing to help where they can, I will ask Spook to create a discussion space here, under Hosted Projects, and use it to share and maintain the project sourcecode, which is public domain under the MoreBeerWare license (not GPL'd).

Since I have already offered the use of the name XASM to an interested party, I will and do consider this a WORKING TITLE ONLY, subject to said party getting off their posterior and using it before I go Beta :P

Who:
You, me and Bobby McGee

What:
XASM is a multi-syntax, multi-platform compiler/assembler hybrid ('compembler'), which abstracts all notions of a Physical Machine, and allows the user more control : for example, to define entities such as Registers and OpCode Encodings.

When:
Tokenizer and Lexical Parser written over 2 years ago, all other code written within the past 6 months, at my leisure.

Where:
So far, mostly in my kitchen, since that's where the coffee is.

Why: (my pet hates)
I hate porting code.
I hate switching from one assembler to another, and having to remember/learn the various subtle differences in Grammar and Syntax.
I hate that no one assembler can produce output for arbitrary hardware platforms, including those which don't yet exist.
I hate that I can't always easily port my macros between assemblers.
I hate that I can't use assembler macros within INLINE ASM blocks in C/C++.
I hate that the corporate universe is determined to kill off my language of choice.

I felt that VM concepts were sound, but had been aimed at the highlevel aspects of programming (by highlevel programmers) for too long.
With the rebirth of OOPASM as a formal programming paradigm, the line between assembler and compiler becomes blurred.
I decided to wash that pesky line away altogether, and to apply VM concepts beginning with most lowlevel aspects of programming - to do away with the notion of a Physical Machine entirely.

I've written XASM to allow for loose syntax - I intend it to support the free mixing of all major assembler, C and C++ syntaxes, and to support output to arbitrary hardware platforms and operating systems.

XASM is based on VM concepts, and is written for MASM/OA32.

If you are interested in joining the XASM Development Team, please respond within this thread and/or via the PM facility on this board.

H.
Posted on 2007-05-18 01:19:44 by Homer
Just random thoughts:

won't little-endian/bigE introduce trouble when doing low-level byte-addressing of dwords, for instance?

Syntax... there's already one such "assembler" or preparser for nasm - I can't remember/find its name.

The VM integration is interesting - but managed memory is what springs to mind as the only viable reference to that idea.  So, a language like the D language, but accessible in arbitrary syntax is what can be a good vector :) . I feel the need for managed memory sometimes (especially around complex undo/redo)... Just make sure to do auto-garbage collection when a large memory chunk is dereferenced completely. Non-cooperative memory-hogging Java applications is what everyone's sick of.

Why base the assembler on VM, instead of compilation? Portability of Java "binaries" is a sick joke of marketing, it really takes 100 hours instead of 10 hours to port real Java software to another platform successfully. And in those 100 hours one can hear developers scream and break things in the office :).

A flaw in the design of XASM imho is that you can attain portability only of simple code, that doesn't rely on features of platforms. Unless you make huge supporting libraries, each with versions for each platform to support. Same "flaws" as C/C++ - and same way of fixing them.

I can see (and would use) XASM only as an alternative to C/C++/D, as a compiler. Where speed matters, I'd always use the appropriate assembler (though I usually change the syntax a bit to be comfortable for me). The modern instruction sets aren't that many: x86, x64, ARM, PIC, AVR, PowerPC, CellSPE; and few people have to learn more than 2 of them.
Syntax-differences in two assemblers for the same cpu is what could be bothersome, as you said. But I find it much more frustrating when there's no macro preprocessor at least as powerful as the one in MASM (as you noted).


I hate that no one assembler can produce output for arbitrary hardware platforms, including those which don't yet exist.

Wouldn't such an "assembler" then be called a "compiler"? No matter how one looks at different instruction-sets, the huge incompatibilities between cpus become obvious. Operations like "stmdb sp!, {r4-r12,r14}"  (ARM) and "btfsc STATUS,C" (PIC) are often used on these cpus. The abstractions to use are so high-level, that the assembler is no longer an assembler. And the common denominator of possible computations after such an abstraction is small - just like in C. You can only tilt that just a bit by choosing a dominating instruction-set (where XASM will compile best), and then conforming it for other ISs. But if you end up having to use a state-variable in order to conform to some specific operation - without TLB and often-repeated slow access to the TLB data, thread-safety is impossible. Heck, even registers will need TLB - and guess what performance follows. (imagine you decide to provide 32 registers in XASM, you'd need either globals in single-threaded apps, or TLB access on every instruction *gulp*).

my 2 cents
Posted on 2007-05-20 03:41:23 by Ultrano

Where:
So far, mostly in my kitchen, since that's where the coffee is.


Homer,
You really need to put a small table or something similar next to the computer desk,
to set the coffee maker on. Works for me. :)


Rags
Posted on 2007-05-20 06:11:29 by rags
Everyone - I've just finished implementing "infinite recursion prevention within the context of massively nested macro expansion" :)

Ultrano - I am supporting user-defined Types, which allow the user to define Types as "an Array of N elements of Previously Defined Type", where the most Primitive element is 'Bit'.
That allows for end-orientation of datatypes.
Syntax-wise, I simply extended the behaviour of our old friend TypeDef:

NewName TypeDef ExistingTypedEntity

The current implementation uses a dedicated 'internal virtual primitive', I intend to change it to use my Struct implementation which would allow for more complex declarations, such as describing RAX and its SubRegisters  using unions..

It's not a VM integration, it's an assembler based on VM concepts, and with a macro engine strong enough to support highlevel language directives introduced  via macro headers.

The current implementation's memory requirements are handled by OA32, and are thus Heap-oriented.. virtualizing this means adding a switch to OA32 itself, since I probably won't be internally reimplementing OA32 but rather loading its macros.. ie, XASM might not support OA32 internally, but will support it.
Bootstrapping of XASM will be the litmus test.

Portability will be determined, like everything else, by standard header files.
If I support the parsing of C/C++ headers, I'm over the moon.
I'll probably use libC when it comes time to self-port the x86-bootstrapped XASM binary, but what others wish to use in THEIR sources to solve this issue is totally up to them, it's not a buildtime issue.

I don't call XASM an assembler, or a compiler, I call it a 'compembler'.
Assemblers at least know one opcode set.. XASM is not so presumptuous.
Compilers don't let you define opcodes and their binary output.
Optimizing the output code is a curly one, I'll have to study that more, but I have faith it can be done reasonably well, perhaps it only requires some variant macros, some decisionmaking logic and a dumb switch..

I do not have to solve the problems of opcode ambiguities across hardware platforms.. the user does.. guess how? Via macro header files :P
XASM allows for user-defined OpCodes and their Encodings.
You can define new OpCodes, and describe the Binary output expression, with a syntax that is very similar to a macro definition.
I am not hardcoding one single opcode.
Even the x86 opcode set is introduced via a header.

The user is free to write their own interpretations of any opcodes that appear in their sourcecode, either as a softwired OpCode+Encoding(s), or as a Macro which expresses one or more OpCodes+Encodings.

Everything physical is defineable at buildtime, with the current exception of the back-end's file format, which I also hope to template.. currently only support COFF obj file output, via an overloadable class.

Rags - I'm thinking of adding a can dispenser to my case (seriously) with a peltier device that cools one can while heating the coffee cup.
Posted on 2007-05-22 08:37:43 by Homer
At a later time, I'd be interested in having this sort of functionality integrated into PwnIDE.  Of course, I've got lots of work to do before then, 'cause the current state of PwnIDE is somewhat humorous and somewhat embarrassing at the same time.

Anyway, I would like to eventually have integrated compilation and execution, including with C, and this sort of thing sounds similar to what I was thinking of for it.  In the short term, however, I might be more interested in how you do the syntactic analysis, 'cause the way I do it now sucks (which is why it's largely unimplemented, resulting in the many red underlines in the screenshots).  I like my exceedingly brute-force approach to type-checking and instruction documentation, though: Here's my "grammar" file  ;)  Man, it was a lot of typing, and it's maybe half done, hehe.

Cheers
Posted on 2007-06-06 03:50:27 by hackulous
OI:
Thank you for your expression of interest.
I can, and would like to set up XASM as an IDE-aware assembler that builds projects in realtime, as you type them, and can tell the IDE what address a given line of sourcecode is assembled to, which would greatly assist debugging.
It's possible, if we can only agree and decide on how they should communicate.
It can't be via Window Messages, or any other platform-specific mechanism, so that narrows the range.

KETILO:
This is an idea I've held close ever since I saw Ketil Olsen's RADASM ide highlighting errors in response to assembler feedback.
If you read this Ketil0, tell me how masm does it, simply a console feed to stderr?

ALL:
I can understand many of my projects are aimed at very narrow audiences, but I am suprised at the lack of general interest in this project.

HEADS UP:
It has come to my attention that the zip  support in RADASM is slightly broken, anyone who has downloaded the very early sourcecode attachment will not see the FOLDER it contains in WINZIP , or in the Windows Zip Explorer shell.
You can see it from WINRAR, and probably many other tools.
Don't be too disappointed if and when you find it, that sourcecode is very early stuff, development has been rapid since I posted it.
Posted on 2007-06-06 04:50:55 by Homer
wouldn't it be enough to pick one opensource assembler you like, and add support for processors you need? like "revolution" did for FASM/ARM, for example...
Posted on 2007-06-06 05:59:33 by vid
Show me an open sourced assembler with a STRONG macro engine that is not under GPL or derivative license, and I'll pack this project up in mothballs and cheesecloth :)

Most assemblers have relatively weak macro engines, with the major limiting factor being the length and complexity of general statements.
In the past, assembler macros have been relatively simplistic, and not required substantial support. Times have changed.

XASM currently supports symbol names of up to 512 bytes, single line statements of up to 8192 bytes, and no limit on the total length of multi-line statements. It supports massively deep nesting of macro statements by default, inheritance of macro locals within a nested context, etc.
I'm sick to death of running into glass ceilings...if something else was up to the task, I'd be using it already, and recommending it.

Although most asm programmers will never push their assembler to the extremes that I do, I know I am not alone.

More to the point, existing assemblers are based around fixed machine concepts, and reworking them for a new physical machine architecture is worse than writing a new assembler, since you're stuck with all kinds of redundant functionality, syntax nuances and other headaches.

I hope to expose more of the assembler internals to the macro engine, essentially giving the user the power to modify the internal behaviours of the assembler at a level usually only seen in machinecode monitors ('mons') and in virtual machine systems.

Posted on 2007-06-07 02:02:59 by Homer
hmmm... FASM? :lol:
but it's written in x86-32 assembly, that will probably bother you...

XASM currently supports symbol names of up to 512 bytes, single line statements of up to 8192 bytes, and no limit on the total length of multi-line statements. It supports massively deep nesting of macro statements by default, inheritance of macro locals within a nested context, etc.

same does FASM, except 256 byte limit for symbol names, and no limit on single line statements.

Can you give some excerpt of your macro syntax? I would really love to see it... (and find some problems within it ;) )
Posted on 2007-06-07 07:38:44 by vid
I'm beginning with masm macro syntax, so as to support oa32 immediately (for the first bootstrap).
Some initial support for nasm syntax is also there, but I won't complete it until the masm directives are all implemented.
Already, I've extended some of masm's directives, one example is typedef.
My alternative syntax is to allow an optional COUNT argument, thus it is possible to describe any Type as an Array of any existing Type, including your own.
Only BIT has been hardcoded as a physical Type, all other Types are provided via a Header, which the user can mess with directly or indirectly.
For example, BYTE typedef 8 BIT.
Another thing I did is allow the ENDP directive to work like ENDM, ie ENDP
Another example is the loose typing of immediate numerical values.
Hex values can use the 0x prefix instead of or as well as the h suffix.
Floats don't need to have an f suffix provided they contain a decimal point.
That sort of thing.
I've tried to relax some of the unnecessary grammatical restrictions, because I hope to leave enough flexibility to implement entire highlevel languages via the macro engine (user-defined macro functions) and via the systemic header (which softwires the assembler behaviours via the macro engine, thus behavioural modifiers can be placed inline within your sourcecode if you want/need to.)
As you can see, with a little flexibility on the part of the assembler, it becomes possible to manipulate the interpreter VIA the interpreter.. this is nothing new, but its never been a major design consideration for assemblers generally, outside of some switches such as .OPTION directive.
Ultimately, I hope to have similar flexibility in the back-end (which is currently only supporting COFF file output).

Posted on 2007-06-07 08:24:36 by Homer
Be sure not to copy recursive macros from MASM ;)

these are very bad design, rather use macro overloading.
Posted on 2007-06-08 16:36:48 by vid
Hi
I can not agree with you Vid, recursive macros are a great thing. I use them a lot.
Of course there are some things that can be done better, for instance, the recursion level is limited currently to 20, which in some cases is too less. The parameter passing is also annoying, but all in one, it is a nice feature that I don?t want to miss in XASM.

Regards

Biterider
Posted on 2007-06-09 01:14:56 by Biterider
One type of thing I'd like to see is something sort of like templates and sort of like method overloading, which could be implemented as something like a cross between a procedure and a macro.  Basically, the procedure would receive macro parameters that could define types, but instead of necessarily inlining the whole procedure by making it a macro, the procedure could be duplicated for each different set of macro parameters used.

Although the following isn't a great example, (since you could pass typeSize as a real parameter), it at least illustrates what I mean:

CopyObject  EXTENDEDPROC    pObject:DWORD,type:REQ
    invoke  LocalAlloc,LMEM_FIXED,sizeof type
    push    eax
    invoke  CopyMemory,eax,pObject,sizeof type
    pop    eax
    ret
CopyObject  ENDP

This could be combined with IF/ELSEIF/ELSE/ENDIF statements on the type where type-specific code is needed inside the function, (of course there'd need to be some type-independent code, otherwise the purpose of simplifying the code is defeated).

There was something else that I was thinking of too, but I can't remember what it is right now, hehe.  :)
Posted on 2007-06-09 01:51:17 by hackulous
What I have done:

I treat procedures in a very similar way to macros.
Both support massively deep recursion.
It's possible to define more than one version of any named macro or procedure, provided that its argument names and/or types are different.
The only problem with multiple procedure definitions is when they have the same number of parameters, they cannot be exported with the same name.
The workaround for this is to create an Alias for each duplicate procedure, and export the Alias instead.
Since I am not writing my own Linker, I am stuck with the existing name-mangling schemes, and so we're forced to alias our way around this issue.
I may ultimately add code to detect and generate these aliases automagically, but since its critical that the more capable users understand whats happening (so they can marry the exported aliased symbol names in other modules), I've decided at least for now to leave well enough alone :)


Posted on 2007-06-09 02:03:45 by Homer
Good point, I hadn't fully considered the exporting thing, but yes, that'd be an issue for such a procedure.

This is a bit different than just type-checking, in that you only need one copy of the function in source code form, and you can pass extra type information, such as the type of the data held by a Vector object when passing the Vector.  For an example (a rather sad one) of how this could be useful, take a look at my Vector implementation, or even just look at the hillarious line count.  Most of the procedures should be inlined anyway, but some not as much.  The basic idea is that instead of having 5-10 copies of each function, I could just have one copy, with some type-specific stuff separated by IF blocks.  This does run into the exporting issue, since it'd be useful to export the functions, but I'm planning on having PwnIDE automatically generate ".def" and/or external ".inc" and ".h" files anyway (based on which functions you flag to export), so as long as it knows the renaming scheme, it'll work.  The external ".inc" file could even have macros in it to automatically do the name translation on the importer's side.

One really nice hack that PwnIDE has like this already is enumerations.  In the doc comment for a set of constants, it just keeps track of that it's an enumeration with a certain increment (+a,-a,shr,shl), and the constants stay just as if they were regular constants, so no changes to the assembler are needed for it.  Mind you, PwnIDE STILL doesn't have basic code editing, so I better get to work, lol.  ;)
Posted on 2007-06-09 02:43:50 by hackulous
Ah, now I remember what I forgot earlier.  It'd be nice to be able to specify that certain parameters should be passed by certain registers, and to indicate that the value is already in the register, you could pass that register there.  It'd be a bit more complicated for the assembler to figure out because of possible dependencies (e.g. ecx = eax and eax = ecx, or passing something in eax, and the content of ), but thanks to the xchg instructions, it shouldn't affect performance.  I've got a way to specify this in PwnIDE, but no way to enforce it since MASM doesn't support it (and it'd be way too sketchy just to put it through my own preprocessor at this point).
Posted on 2007-06-09 02:55:52 by hackulous

I can not agree with you Vid, recursive macros are a great thing. I use them a lot.
Of course there are some things that can be done better, for instance, the recursion level is limited currently to 20, which in some cases is too less. The parameter passing is also annoying, but all in one, it is a nice feature that I don?t want to miss in XASM.


You can "emulate" recursive macros with nested macros, but you cannot get effect of nested macros (macro overloading) with recursive macros.

So nested macros can do anything that recursive macros can, and lot of extra stuff.

The real beaty of nested macros is that you can slightly change behavior of macro/directive, without having to rewrite it's entire functionality.

For example if you have ready PROC directive or macro, and you want to add extra symbol for every procedure. With recursive macros you would have to modify original PROC macro, or in case when PROC is directive, you are screwed. With nested macros you just overload original macro/directive, add bit of extra functionality you need, and let original PROC do all it wants:

(simplified) example how it works in FASM:

macro proc name,

  ;define symbol "__is_stdcall_<name of procedure>"
  __is_stdcalll_#name = 1

  ;let rest of things be done by original macro
  proc name, args
}
Posted on 2007-06-09 04:27:10 by vid
Hi
I see the benefits of the ?overloading? as you describe it before, but there are other ways to achieve this, without loosing the recursion functionality.

Maybe you can explain how you can emulate a recursive macro with nested macros without knowing the recursion deep? What about the local symbols of each iteration?

I can imagine a way to overload the macro name and calling the original macro from within it using special directives. This way, all remain possible: recursion and overloading.

macro proc name, 

  ;define symbol "__is_stdcall_<name of procedure>"
  __is_stdcalll_#name = 1

  ;let rest of things be done by original macro
  overloaded proc name, args
}


Regards,

Biterider

Posted on 2007-06-09 06:11:54 by Biterider
Hi Homer
I think you have to do something about the name of your project

http://xasm.webpark.pl/xasm/

Regards

Biterider
Posted on 2007-06-09 14:12:54 by Biterider

Hi
I see the benefits of the ?overloading? as you describe it before, but there are other ways to achieve this, without loosing the recursion functionality.

It is possible to directly emulate macro recursion by macro overloading, but not vice versa. You loose nothing with overloading.

Maybe you can explain how you can emulate a recursive macro with nested macros without knowing the recursion deep? What about the local symbols of each iteration?

Here is example of how macro recursion works in FASM, which has macro overloading:

;this macro defines macro "a"
macro define_a
{
  ;this is body of macro "a"
  macro a \{
    define_a ;redefine "a" inside it's body
    a ;and use it
  \}
}
define_a ;define toplevel macro


I can imagine a way to overload the macro name and calling the original macro from within it using special directives. This way, all remain possible: recursion and overloading.

might be interesting... maybe we can compare when/if it's implemented.

For sure it's interesting idea for XASM
Posted on 2007-06-09 18:53:41 by vid