I am back:tongue:
these days, I have learned O'Caml, Eiffel, and Python and thus got more ideas about type and asm.
A few months ago, I have worked out a MACRO implemented OOP model for asm, but now I am interested in a more general topic - type system.
two keyword:
static typing
and type inference
We all focused on how to do OOP programming in asm, and using MACRO to invent some new keywords or to say syntax. Yes, we have done that. But now, how about something more? In one model, one maybe use some TEXTEQU to represent a compile time variable, and using these compile-time variables to recode the type infomation related to class(what it inherited from, what fields it have ...).
I see a trend: MACRO was over used, I even thought about wrap every line in a MACRO, it may be looks like this:
FUNCTION myfunc, arg1:type1, arg2:type2
.MOV eax, 1
.RET
FUNCTION END
notice the .MOV is a macro to replace mov(a instruction).
I want to say using a stand-alone preprocessor just as someone have shown us will be better.and we may implement the following features:
single inherit
interface (like java)
template (like c++)
meta programming (to access the info about type when compiling, it will eliminate the need of template specialization or things alike)
typed assembly code -> preprocessor -> normal assembly code
static typing, type inference(by template) and meta programming will enable asm to do more things, and let the code more safe. studying the ways used in other language(imperative or functional, dynamic or static, typed or untyped) will widen our view.
these days, I have learned O'Caml, Eiffel, and Python and thus got more ideas about type and asm.
A few months ago, I have worked out a MACRO implemented OOP model for asm, but now I am interested in a more general topic - type system.
two keyword:
static typing
and type inference
We all focused on how to do OOP programming in asm, and using MACRO to invent some new keywords or to say syntax. Yes, we have done that. But now, how about something more? In one model, one maybe use some TEXTEQU to represent a compile time variable, and using these compile-time variables to recode the type infomation related to class(what it inherited from, what fields it have ...).
I see a trend: MACRO was over used, I even thought about wrap every line in a MACRO, it may be looks like this:
FUNCTION myfunc, arg1:type1, arg2:type2
.MOV eax, 1
.RET
FUNCTION END
notice the .MOV is a macro to replace mov(a instruction).
I want to say using a stand-alone preprocessor just as someone have shown us will be better.and we may implement the following features:
single inherit
interface (like java)
template (like c++)
meta programming (to access the info about type when compiling, it will eliminate the need of template specialization or things alike)
typed assembly code -> preprocessor -> normal assembly code
static typing, type inference(by template) and meta programming will enable asm to do more things, and let the code more safe. studying the ways used in other language(imperative or functional, dynamic or static, typed or untyped) will widen our view.
I totally agree!
There should be a new kind of assembler - an OOP assembler (some would argue compiler).
First, devised a syntax which does not veer greatly from MASM/NASM syntax.
Second, choose a method to implement code generation of OOP transformations.
Third, ?
...I am on the third part. :cool:
There should be a new kind of assembler - an OOP assembler (some would argue compiler).
First, devised a syntax which does not veer greatly from MASM/NASM syntax.
Second, choose a method to implement code generation of OOP transformations.
Third, ?
...I am on the third part. :cool:
I am really very happy you support my idea:grin:
let's talk about the implemention detail:
I thought generating the native code is very expensive and hard to do, so I want the target enviroment be masm or nams or fnasm(although masm was prefered).
the key concept is static typing, it is a revolution just as brought structured programming in asm. and staic typing more than OOP, oop means object-oriented programming, but static typing always resulted in multi-paradigm(generic programming, and meta programming).
type will make what the code is doing more precisely and static typing won't consuming a lot resource comparing to dynamic typing.
I suggested some syntax here:
for class definition:
for template procedure:
for template class:
and more... I have a lot ideas about syntax and features, I just need a experiment to test which one is better to use.
let's talk about the implemention detail:
I thought generating the native code is very expensive and hard to do, so I want the target enviroment be masm or nams or fnasm(although masm was prefered).
the key concept is static typing, it is a revolution just as brought structured programming in asm. and staic typing more than OOP, oop means object-oriented programming, but static typing always resulted in multi-paradigm(generic programming, and meta programming).
type will make what the code is doing more precisely and static typing won't consuming a lot resource comparing to dynamic typing.
I suggested some syntax here:
for class definition:
class myclass:public baseclass, interface1, interface2
var field1:type1
var normalfiled2:dword
method method1(arg1:type1)
class end
for template procedure:
template T:typename, i:dword
myproc2 proc type1:T
mov eax, i
ret
myproc2 endp
for template class:
template T1:typename, T2:typename
class myclass2
var field1:T1
var field2:T2
class end
and more... I have a lot ideas about syntax and features, I just need a experiment to test which one is better to use.
Have you seen TALC?
Sounds like what you want to create.
Sounds like what you want to create.
How should the transition from a complex type to a machine dependant type be made?
In all languages that I know complex types are built from basic types and other complex types (circular definitions are bad). Most languages handle pointers differently -- assembly seems to be the exception. Of course, most languages offer a mechanism to override the type (casting).
MASM attempts to handle pointers a little differently by introducing the PTR/OFFSET/ADDR keywords and the SEG:OFFSET syntax. IMHO, pointers need to be handled differently due to the introduction of different addressing modes in the x86 processors (and we have 64 bit pointers to include now with x86-64). Most assembly language programmers don't want anything hidden in the code and MASM goes against this in a bad way, but to offer this abstraction without hiding something is impossible.
As most people are aware assembly language is most affective where speed is concerned and this is an economically valued skill - more so than any other. So, I would like to reduce the language down to the optimization of algorithms and then add a scripting language on top of it. In many ways this abstracts the instruction set from the control structures of the language, enabling greater ease of scaling and global optimizations. Some may think this hypocrisy, but this is the direction I want to take assembly language and believe follows in the spirit of what MASM wanted to be as high-level assembler.
Combined with a type resolution system which manages the dependancies of: files, objects, functions, macros, symbols, control flow, and instructions; produces quite an assembler if I do say so myself. :)
{I'm going to have to continue this...}
In all languages that I know complex types are built from basic types and other complex types (circular definitions are bad). Most languages handle pointers differently -- assembly seems to be the exception. Of course, most languages offer a mechanism to override the type (casting).
MASM attempts to handle pointers a little differently by introducing the PTR/OFFSET/ADDR keywords and the SEG:OFFSET syntax. IMHO, pointers need to be handled differently due to the introduction of different addressing modes in the x86 processors (and we have 64 bit pointers to include now with x86-64). Most assembly language programmers don't want anything hidden in the code and MASM goes against this in a bad way, but to offer this abstraction without hiding something is impossible.
As most people are aware assembly language is most affective where speed is concerned and this is an economically valued skill - more so than any other. So, I would like to reduce the language down to the optimization of algorithms and then add a scripting language on top of it. In many ways this abstracts the instruction set from the control structures of the language, enabling greater ease of scaling and global optimizations. Some may think this hypocrisy, but this is the direction I want to take assembly language and believe follows in the spirit of what MASM wanted to be as high-level assembler.
Combined with a type resolution system which manages the dependancies of: files, objects, functions, macros, symbols, control flow, and instructions; produces quite an assembler if I do say so myself. :)
{I'm going to have to continue this...}
yeah, high-level assembler!:alright:
to tell the truth, I didn't quite catch on what you have said due to my poor english. But I think we all try to make something different. A new kind of assembler, but no HLA. Or more precisely, a new kind of assembly lanuage. Can you make a more detailed description on what kind of things you want to add?
I am quite agree with you on this point. abstraction needs hiding, just as invoke XXXX, you cannot see push/call anymore, but I think most of us prefered invoke to call. And, I see, you try to generalize the structure of asm, maybe you can made the new asm be platform/cpu independant. And I want to add static typing/oop/generic programming/meta programming to asm. They all can make the life of asm coder much easier, and make the asm much more powerful.
asm:
from plain instruction sequence
to .if/.while
to structured with proc
you want to make expressing how the instructions are organized more easily and I want to make structure the block of code more easily. It may be two different direction, but all goes to one goal, more structured asm.
May be I have mistakened what you have said, and you can give a more plain english description about it.
some further note about what I want to do:
I want to mark symbol with type, the symbol maybe the name of global variable, name of local variable, name of procedure
the type itself is just a compile-time symbol.
then there will be two benefits, using the typed variable(objects) can be easier, and pass a variable(objects, register value) must be typed correctly.
assuming pstruct is a pointer to a struct variable
load eax, pstruct
mov eax.field1, 0
load itself maybe expressed as:
mov eax, pstruct
cast eax, ptr some_struct
the second benefits may be demonstrated as:
mov eax,pstruct
invoke use_some_struct, eax
no, you have to mark the type of eax
mov eax,pstruct
cast eax, ptr some_struct
invoke use_some_struct, eax
or:
load eax,pstruct
invoke use_some_struct,eax
because pstruct is a local variable which has been marked with type ptr struct, you can use load directly.
to tell the truth, I didn't quite catch on what you have said due to my poor english. But I think we all try to make something different. A new kind of assembler, but no HLA. Or more precisely, a new kind of assembly lanuage. Can you make a more detailed description on what kind of things you want to add?
Most assembly language programmers don't want anything hidden in the code and MASM goes against this in a bad way, but to offer this abstraction without hiding something is impossible.
I am quite agree with you on this point. abstraction needs hiding, just as invoke XXXX, you cannot see push/call anymore, but I think most of us prefered invoke to call. And, I see, you try to generalize the structure of asm, maybe you can made the new asm be platform/cpu independant. And I want to add static typing/oop/generic programming/meta programming to asm. They all can make the life of asm coder much easier, and make the asm much more powerful.
asm:
from plain instruction sequence
to .if/.while
to structured with proc
you want to make expressing how the instructions are organized more easily and I want to make structure the block of code more easily. It may be two different direction, but all goes to one goal, more structured asm.
May be I have mistakened what you have said, and you can give a more plain english description about it.
some further note about what I want to do:
I want to mark symbol with type, the symbol maybe the name of global variable, name of local variable, name of procedure
the type itself is just a compile-time symbol.
then there will be two benefits, using the typed variable(objects) can be easier, and pass a variable(objects, register value) must be typed correctly.
assuming pstruct is a pointer to a struct variable
load eax, pstruct
mov eax.field1, 0
load itself maybe expressed as:
mov eax, pstruct
cast eax, ptr some_struct
the second benefits may be demonstrated as:
mov eax,pstruct
invoke use_some_struct, eax
no, you have to mark the type of eax
mov eax,pstruct
cast eax, ptr some_struct
invoke use_some_struct, eax
or:
load eax,pstruct
invoke use_some_struct,eax
because pstruct is a local variable which has been marked with type ptr struct, you can use load directly.
IMHO, adding types to assembly language is a higher level abstraction and their items should not be loaded into registers directly unless the programmer is prepared to do all the work. Your example would reduce to:
mov [pstruct].field1, 0
The brackets mean the structure pointer is indirect - we don't want the offset from pstruct's address, but from the offset of the address stored at pstruct. The assembler does the rest. pstruct's type would have to be define prior or cast in the statement:mov [pstruct PTR MyStruct].field1, 0
...this would rarely need to be used. Quite simply, when a programmer states a register they know what they are doing - they want to work at that level. For example, EAX is used for return values from windows functions - why ever call it EAX? ...it is the return value. We only call it EAX when we want to do something with EAX. If we just want to store the return value to a function, then we should say that:invoke MyFunction, MyStruct.item2
mov MyStruct.item3, MyFunction.return
Seems silly at first, but this is a simple example. The assembler could inline the function if it wants and integrate it into the rest of the code - all the information to do that is here. The optimizer could remove the whole thing in rare cases! The interfaces between pieces of code need to be managed by the assembler to provide this functionality.How about have a talk about *all* the things you want to do...
One thing I am sure is, what I am going to do is to write a preprocessor which will produce the nasm/masm/fasm code. No more macro, but a stand-alone preprocessor. It can do global analysis, but macro can not.
a bunch of source code(*.xxx) -> all put to the preprocessor -> another bunch of source code(*.inc, *.asm) -> be assembled by nasm/masm/fasm -> native executable file
And what?
forgive my ignorant
One thing I am sure is, what I am going to do is to write a preprocessor which will produce the nasm/masm/fasm code. No more macro, but a stand-alone preprocessor. It can do global analysis, but macro can not.
a bunch of source code(*.xxx) -> all put to the preprocessor -> another bunch of source code(*.inc, *.asm) -> be assembled by nasm/masm/fasm -> native executable file
And what?
forgive my ignorant
Hi taowen2002,
the "preprocessor" approach for a OOP implementation is surely much better/powerful than using macros.
But, if you know C++, try to consider:
- C++ with templates is very powerful. May your approach achieve this level?
- for many features you will just have to "reinvent the wheel" regarding C++
- C++ has (limited) inline ASM capabilities
So currently IMNSHO such a preprocessor would be nice to have, but it is much work and unless it offers more than C++ with templates its possibly a waste of time.
Japheth
the "preprocessor" approach for a OOP implementation is surely much better/powerful than using macros.
But, if you know C++, try to consider:
- C++ with templates is very powerful. May your approach achieve this level?
- for many features you will just have to "reinvent the wheel" regarding C++
- C++ has (limited) inline ASM capabilities
So currently IMNSHO such a preprocessor would be nice to have, but it is much work and unless it offers more than C++ with templates its possibly a waste of time.
Japheth
The compile time programming is the knife of the asm.
I have just given up my plan, but I am interested in BitRake's idea. A new assembler is always the goal.
I have just given up my plan, but I am interested in BitRake's idea. A new assembler is always the goal.
Better make it for 64 bit :) If so, count me in - I'll help any way I can.
So currently IMNSHO such a preprocessor would be nice to have, but it is much work and unless it offers more than C++ with templates its possibly a waste of time.
taowen2002, if you take a look at KetilO's code you will see that he uses very little the registers, and he takes full advantage of the high-level features of MASM. This is very productive because the work is more symbolic and limitations are few - he quickly goes from idea to code and his understanding of the interfaces is clearly communicated.
Now let us go to the other extreme: ...
{...to be continued...}
... So, the goal becomes to eliminate limitations of ASM compared to C++ (other languages as well) while retaining or amplifying the benifits of ASM. ...
The question is whether that goal gives you the most benefits. As you said, assembly is great for speed but the cases where it matters are rare. One of C++'s big benefits is that it's a HLL but still at such a level that it can compile to pretty efficient code. When it comes to speed, you use assembly and you use it in the lowest level. For example, when you want to get the most out of it you don't use masm's parameter handling, but build your own stackframe. And I think a hand optimized binary search through some list will be more efficient than one using asm templates. My point is: if you're going for absolute speed, you'll avoid as much HLL as possible (until the point where it just gets silly, like hex programming ;))
So IMHO, the best approach to use the best of both worlds is not trying to turn assembly into a semi-HLL, but to make it easier to interface the two and use them together. For example, an assembler that compiles on several platforms, automatically writes C(++) headers for the functions it exports etc. At least for me that would be more useful than being able to use templates in asm.
Thomas
Good point Thomas, but I don't like programming in C++ and don't need my code to run on non-x86 processors. I don't really want to create HLA or templates in ASM. I would like to create more vertical solutions in ASM. I am confident in the long term viability of x86.
Additionally, I don't want to exclude integration with C(++) -- I am just saying that is not what I am going to do.
Additionally, I don't want to exclude integration with C(++) -- I am just saying that is not what I am going to do.
Good point Thomas, but I don't like programming in C++ and don't need my code to run on non-x86 processors. I don't really want to create HLA or templates in ASM. I would like to create more vertical solutions in ASM. I am confident in the long term viability of x86.
I should have said OS instead of platform, because that's what I meant. A general assembly for multiple platforms wouldn't be very useful since it could never be optimized for both at the same time. But since C++ is often compiled on several OSes, an assembler that is written in portable C++ that can produce several outputs (COFF, ELF) would probably be a useful thing. I know fasm is already available on linux but if one would write a new assembler I would surely think about support for multiple OSes.
It's okay if they all use x86 (though I would design it so that new 64-bit languages etc. can easily be added later), since that's going to be around at least for quite some time.
Thomas
A general assembly for multiple platforms wouldn't be very useful since it could never be optimized for both at the same time.
This is an invalid assumption - there are the same limitations that any language has in that regard. Please, explain how this statement is true.
The difference with other languages is that assembly isn't compiled. C++ for example can be optimized for two completely different processors because the source is platform independent.
Assembly isn't platform independent but written for one specific processor, and thus optimized for that processor. Even if you could assemble the same code for a different processor, which is usually impossible in the first place because of the different opcodes available, it isn't optimized anymore because every processor has its specific optimization techniques.
I've seen gcc or some other compiler have a cross platform assembly syntax, where the compiler figures out the register usage etc. So it isn't impossible but choosing registers is not really advanced. Say you have two processors. On one, there's a really fast division opcode that is best suited for all divisions. But on the other, division is slow but there are some other neat opcodes that you can combine in a clever way to perform fast division. How would you express this in assembly? In a HLL, both will just be a / b or something and the compiler will choose the best implementation. Assembly is on a much lower level where you can't express this with such abstraction. You could use a general division opcode 'div' that translates to the division opcode on platform 1 and the trick on platform 2 but still, what if the trick has some side effects that can't be ignored, like extra registers that are invalidated, or flags that change. Also, if the programmer doesn't know what opcodes the pseudo opcode 'div' will produce, how can he optimize around it?
The problem in the example above lies in the strange mix of high level and low level coding. On the one hand, you want to work on the lowest level to get the most out of the processor, on the other hand you want to make programmer easier by letting the assembler/compiler make choices for you.
Optimizing is all about knowing exactly what's going on in the lowest level. Abstraction only reduces this knowledge and thus is bad for optimizing (at code level that is, not at design level of course). That's why I'm a bit hesitant with HLL extensions to assembly language (even though I worked out an object model with NaN :)).
In my opinion, it's best either to use low level programming or high level programming, depending on what suits the task. Mixing low level programming and high level programming at function level is no problem at all either. But creating a language somewhere in between has no real benefits in my view and only weakens the strengths of assembly. Everyone may have his own opinion on this, this is just how I think about it.
Thomas
Thomas, great outline of the problems - I thought about this for years and have analyzed all the problems you have stated. Let us see if I can address them all here:
Following the Zephyr Project I learned that they were using machine descriptions to translate code from one to another. Using a description of each instruction and it's effect on machine state, several layers of code generation can be exposed to the programmer if they desire. For example, a general routine could be developed at a semi-high-level and as needed the programmer could drop down into more refined views of the code -- some changes would propagate back to the general view while others would be tied to processor specific view. (Note this isn't limited to processor limitations - each layer can have features that only translate to higher abstractions in a very general way and others will require no translation.)
The same algorithms are used to control the interaction between instruction dependancies and procedure/object/file/etc dependancies. :)
Following the Zephyr Project I learned that they were using machine descriptions to translate code from one to another. Using a description of each instruction and it's effect on machine state, several layers of code generation can be exposed to the programmer if they desire. For example, a general routine could be developed at a semi-high-level and as needed the programmer could drop down into more refined views of the code -- some changes would propagate back to the general view while others would be tied to processor specific view. (Note this isn't limited to processor limitations - each layer can have features that only translate to higher abstractions in a very general way and others will require no translation.)
The same algorithms are used to control the interaction between instruction dependancies and procedure/object/file/etc dependancies. :)
The idea of being able to choose a specific abstraction level for each part of the code is nice on itself.. So actually you are creating a whole range of pseudo languages ranging from assembly (if you consider that the lowest level) to some form of HLL? However I think having such a range of languages might be a bit overkill. I mean, C is pretty low level and assembly is even lower but I'm fine with this choice. Is it really advantageous to have another language in between? I guess it's all a matter of where you set the boundaries but I think the change from LLL to HLL is a good boundary and that having extra boundaries leading to a-bit-less-LLs and almost-HLLs doesn't add much advantages (from a practical point of view). I see more in broader boundaries: like low level - high level - functions/methods - classes - modules - applications or something.
Thomas
Thomas
I would like the langauge to mimic reality - boundaries are perceptual. The only real boundaries are at external interfaces supported. I know it seems really messy and it is not what I learned 20 years ago about programming, but I want to give it a try.