Just interesting questions I have been pondering.

-If all CPU structures and instructions follow the same basic premise of logic and math, how hard would it be to create a uniform/universal type of assembly language conversion/compilation to generically encompass these architectures?
-How much different would it be from the logical structuring of source code compilation?
-How difficult would it be to optimize for specific architectures as compared to source code?

I would like to see suggestions/expanions of these ideas :)
Posted on 2005-05-23 05:33:48 by SpooK
I think those are the questions that made programmers come up with CPU emulators and cross-compilers. I think these two items take care of all your questions.

1. How hard would it be? It's not really that hard nowadays since our hardware speed, memory and storage are sufficient to handle multiple copies of different CPU and OS emulations at the same time.
2. All it takes to compile for a new CPU is to add the support into the compiler. gcc has support for many processors.
3. See previous answer.

Thus, the deltas or differences between different CPU architectures can easily be seen by the coding it takes to accommodate that particular CPU within an existing CPU emulator or compiler.
Posted on 2005-05-23 13:00:28 by Kdr Kane

-If all CPU structures and instructions follow the same basic premise of logic and math, how hard would it be to create a uniform/universal type of assembly language conversion/compilation to generically encompass these architectures?


You mean a syntax to use in source code, or a binary bytecode produced by a compiler?
Posted on 2005-05-24 14:25:30 by QvasiModo
Both really. A generic architecture/bytecode is the main goal, but not necessarily to be interpretated directly. I am also talking about the assembly instructions to represent that architecture.
Posted on 2005-05-24 14:33:00 by SpooK
It's funny, I've been pondering about this too... :)

One problem I found with a generic bytecode is the need for some specific implementation details, like parameter passing conventions for example. That would make the generic architecture *very* abstract, as you'd have to represent functionality at a higher level than a simple sequence of opcodes.
Posted on 2005-05-24 14:38:59 by QvasiModo
Well, most architectures I know have support for this through the use of the stack or something similar.
Posted on 2005-05-24 14:54:26 by SpooK
But not all. I understand Alpha processors use registers. Also Linux syscalls do. IMHO to make it compatible with everything you'd need an opcode for function calls. Then another program would convert this generic bytecode into native code, and decide which calling convention to use in each case.

I couldn't come up with other such examples (save from direct hardware access, etc), but I suppose ther could be other problems I didn't think of yet... :?:

It's an interesting topic. :)
Posted on 2005-05-24 16:06:40 by QvasiModo
Processors aren't so much the same, instruction-set-wise. Implementing a universal assembler won't be optimization-friendly imho. A C/C++ compiler will always beat it.

I'll just state some major differences in the ARM cpu, compared to x86:
ARM has conditional execution flags in each instruction.
Also, a bit in each instruction specifies whether you really want to change the flags atm.
16 registers available - but directly manipulating memory is limited to just load/store.
Return address is not on stack, but in a register.
The stack is a mess. There are be 4 types of it (post-increment, pre-increment,  post-decrement,pre-decrement), OS vendors decide which to use - and coders have to conform.
Reading data from unaligned address crashes the cpu
No exception-handling
Procedure arguments are: first 4 are in registers, the others - on stack.
Push/pop - two types of them
Little-endian/Big-endian troubles.
Constants in any instruction are limited to 8-bit, rotated by 5 more bits in the instruction
The above means that with call/jmp (actually named BL/B) instructions you can't directly jump to absolutely any part of your code.
Data-processing instructions are not like "add destination,source" , but "add destination, source1, source2"  (destination = source1 + source2)
And many more major differences ... T_T . I wonder who once told me ARM is like x86 ^^".

I use macros to make my ARM PalmOS code directly portable to ARM WindowsCE . Since Microsoft and PalmSource have decided on different approaches in stack/proccall/data_addressing. But that's the limit of making a portable assembler - macros for conforming different standards/approaches on the same cpu.

Anyway, my definite conclusion - assemblers are cpu-specific, and they should stay like that. If you want portability, code in C .
Posted on 2005-05-26 11:08:04 by Ultrano
Like I suspected... the level of abstraction needed to make the code protable would turn this "portable assembler" into a full high level language. Too bad, it would have been cool...
Posted on 2005-05-26 17:41:02 by QvasiModo
There would be no doubt that it would need to "compile" for anything to work, I was just wondering what the difference would be between C and an uniform assembly language in such a manner. Thanks Ultrano.
Posted on 2005-05-26 17:51:47 by SpooK
Very old topic, I know... just had never seen it before.
I'd like to add two things here.
First... in my circle of programming friends, we always used to refer to C as 'portable assembly'... because in a way that's what it is. It abstracts away architectural differences such as registers and the actual machinecode, but still allows you to use all the common operations for every CPU.

Second... universal bytecode is something that is used by most modern compilers. The compiling is done in two stages. The first stage is generic for all architectures... parsing of the sourcecode into an expression tree, and doing global optimizations etc.
This results in a list of universal bytecode, which is then sent to the architecture-specific backend to generate the native machinecode.

In the case of Java and .NET, they simply cut the compiler in half, where the sourcecode is compiled to universal bytecode and stored like that in the 'binaries'. The conversion to architecture-specific code is done in the 'virtual machine'.
There are tools to program 'assembly' in Java or .NET bytecode directly, but there is virtually no gain over using a high-level language, since the universal bytecode is so generic. The only reason why assembly can be smaller/faster/etc than C is because you can exploit architecture-specific features. Universal code would eliminate this by default.
Posted on 2010-07-06 04:14:02 by Scali
interesting topic.

i think that having a universal asm that will run on standardized cpus would be a blessing for programmer, eg the "ground troops", but guys who sell software and hardware simply wont wear it...

its simply not worth it economically (unless this would be a monopolization of hardware or asm by, say, guys like ms).
but would they all agree out of free will to make things easier for programmers? bloody never.
becouse their goal is not quolity, or speed, or proficiency, but simply large sums of greenbacks.


Big guys up there want to rule. you can't rule, if all is the same.
Posted on 2010-07-06 10:58:06 by Turnip

but guys who sell software and hardware simply wont wear it...


For the consumer market, never.

I suppose with PCI Express, FPGA's and the like, you can rig something together that wouldn't be too horribly bad.

However, and as Scali said, this is an old topic, and stuff like LLVM trumps this concept through and through.
Posted on 2010-07-06 12:27:51 by SpooK
i think that having a universal asm that will run on standardized cpus would be a blessing for programmer
Why?

Most programmers don't program at the assembly level, and the ones that do (hopefully!) do it for a reason, and want to take advantage of what our CPUs can do... instead of writing crap lowest-common-denominator code.
Posted on 2010-07-06 15:43:41 by f0dder
just my opinion.
i would prefer learning one thing and be welcome as an expert, then standing on a cross road, which changes with every second.
and porting would be easier, won't it?
Posted on 2010-07-07 01:50:46 by Turnip

i think that having a universal asm that will run on standardized cpus would be a blessing for programmer, eg the "ground troops", but guys who sell software and hardware simply wont wear it...


I suppose you could argue that x86 is so commonplace that it's already 'universal asm'.
Posted on 2010-07-07 01:59:33 by Scali
perhaps you are right. but it doesn't stop companies personalizing it in any way possible.
there is masm, tasm, fasm, blah-blah - asm etc.
and is it so important for at&t to have syntax not like intel?
Posted on 2010-07-07 05:59:40 by Turnip
You're confusing dialects of the same language (NASM/MASM/TASM/etc) for different languages (x86/MIPS/ARM/etc). This discussion was originally about unifying the languages, not the dialects.

As for companies, it was less about personalizing the language as a dialect as it was about having a working tool-chain designed around your development staff.
Posted on 2010-07-07 11:54:51 by SpooK
i am simply illustrating that if they cant even have a standardized version of x86 dialect (ms has masm, borland has tasm etc) how can we expect them join their forces volontarily, when they are bitter enemies on the free market.

personally i am a great admirer of open standards.

i have also seen outsource companies quickly adapting their employees  (and tool-chain) to whatever technology is in favor this year (month, etc) . So i believe, that at&t could use intel style, if this wouldn't mean also loosing a few kudos )))
Posted on 2010-07-07 12:13:00 by Turnip
AT&T syntax is actually older than x86.
There is something to say for AT&T syntax. Partly it was designed to make assemblers as simple as possible to write... part of it is also architecture-independent (for example, 68k asm in AT&T syntax is very similar to x86 asm in AT&T syntax). It's also a very unambiguous syntax.
Perhaps not very user-friendly though.
Posted on 2010-07-08 05:41:27 by Scali