Out of curiosity... how difficult do you think it would be to create some kind of 'pseudo shader language'?
That is, to define a grammar that is sorta like HLSL/GLSL, which can be parsed and then sent through one of various backends to generate eg:
- D3D-compatible HLSL code.
- OpenGL-compatible GLSL code.
- D3D-compatible assembly code.
- OpenGL-compatible assembly code.
- nVidia Cg-compatible code.

It would solve the 'missing link' between the APIs (and could be expanded to consoles etc aswell).
Posted on 2010-04-23 07:43:50 by Scali
I think it (the grammar) would be a bit of a no-brainer, really.
I've had no major problems describing very complex and expressive grammars, while I'd classify HLSL-like languages as being fairly rudimentary (although it can be elegant in its crudeness).

I did have some trouble describing recursive grammars, but that was due to my own inexperience with EBNF.
My current issues stem from the fact that I'm trying to write an interpreter in the ABSENCE of a grammar (!)  while simultaneously developing the grammar - I'm using a component based model in the hope that the components will be reusable - which is why I'm trying not to get too bogged down with implementing the semantics of one specific grammar.


Posted on 2010-04-23 13:27:19 by Homer
I stripped mention of the IF directive from the baseclass...

I've implemented the generic 'interpreter baseclass', and derived from it an 'Evaluator' class, which implements a dozen or so Reduction Node handlers which implement the Rules of my example grammar. So far, so good.

The interpreter seeks to 'resolve' nonterminal child tokens of reductions.
If successful, the resulting token is passed back up the tree where it replaces its 'imposter', and if nodes are found to contain only one token after being resolved, they are 'collapsed' (the node is destroyed, and its only token is passed up the tree).
Typically, the handler methods then 'solve' using the simplified (resolved) token sequence.

I haven't implemented a whole lot of grammar yet, but it's enough to show to my peers for their consideration.
If all continues to go well and my peers think this solution looks viable, I will post the interpreter class and derived Evaluator class and test grammar.

BTW some small changes were made to the Parser class.
How many people actually looked at that code?
I mean more than a cursory glance?
It would be nice to get some feedback.
I have my own ideas about things that suck about it that need fixing.
Please feel free to be blunt.
Posted on 2010-04-24 09:10:54 by Homer
I am up to processing a <Assign> = <Expression> ... and I've already simplified <Expression> into (in my example case) <Value>.
At this point, I'm ready to set the value of a (possibly new) buildtime variable.
My interpreter class needs to start keeping a list of typed values it's learned.
Milestone, albeit a small one.

I've developed a design pattern for writing new Reduction Node handlers:
We simply look at the Rules to find the handful of possible token sequences, and write a handler that deals with those cases.

Also, I've adopted a mechanism for 'simplifying' the parsetree in a homogenous way: just before returning from recursion, I check if the current Reduction contains exactly one token.
If it does, I return that token.. notifying the caller (Interpret method) that the Reduction was 'simplified', and that it can throw away the reduction, and overwrite the token that represents it with the returned token.

This allows me to pass tokens back up the tree, 'collapsing' it as much as possible.
In turn, this allows each node to perform its function with 'already minimalised' components.
And it also acts as a kind of automatic garbage collection, so our memory usage remains modest during buildtime.

All clear sailing, captain geek!
Posted on 2010-04-24 11:23:03 by Homer
Just wanna say:

My current test program is


x=y=z


Since z is undefined, this should generate an ERROR !!!!!!

But if we had two lines


z=2
x=y=z


this expression is now 'solvable'... we should set x= (y= (z=2) )

I dont currently have any means to hand back an error, but I deliberately chose an erroroneous statement knowing i would have to deal with error cases, this is what 25 years of programming experience has taught me: expect the unexpected.



My parse tree looks something like:
Assign x = Assign y = z

(simplification)
Posted on 2010-04-24 11:35:42 by Homer
Here is an update of the relevant files, old and new.
Enough to show everyone how I am thinking.
Implementation nuances, hell, anything, can change, but here is a workable recursive solver, without any elegance or any context awareness, implementing part of a concrete grammar from a partial concrete parsetree.
Be dazzled.
Be awed.
Say gee, that looks easy.



Posted on 2010-04-24 11:47:05 by Homer
I've been working on my first derived interpreter, which I call Evaluator.
The goal will be to solve complex expressions (ie those which resemble equations).
To spice things up , I am supporting the resolving of statements of such order of complexity as:


a*=b+6/j+=0.5


For this particular case, my solver for such expressions will not be taking advantage of the parsetree structure, and no operator precedence is being implemented for AssignOps (see below)... however operator precedence does apply for 'regular' math operators within complex expressions, as purely math subexpressions ARE resolved according to the tree structure.

I am "flattening" the expression into a linear sequence of tokens, and then simplifying it by repeatedly solving the rightmost subexpression, where a subexpression is something like:
Id <AssignOp> <Value>

and AssignOp is =, +=, -=, *=, /=
and Value is something like a Number or an Id.


This is the same approach I used in the XASM project.
It is appropriate when the complex expression contains one or more linear (directed) dependencies.
It cannot be used to solve expressions containing interdependencies (simultaneous equations).
But it's just fine for a chain of 'x=y=z' style of statement.

Anyway I'm pretty close, its just painstakingly slow with lots of rebuilding and testing.
And everytime I touch the existing grammar, the software needs to be updated to reflect the changes, so I've had to limit myself to try not to meddle in the grammar while simultaneously trying to write an interpreter for it lol.

Posted on 2010-04-25 21:34:00 by Homer

My previous post is utter nonsense - I was heading down the wrong road.
I was having problems because the tree produced by my Grammar was untidy.
The grammar was restructured to produce 'more atomic' parsetree nodes.

I've scrapped my weekend's work and reverted the code to a more complete version of what I started with - a recursive solver which marshals calls to user handler functions based on observation of the type of sentential structure represented by a given node.
I've written several utility functions to assist in manipulating content of Reductions (parsetree nodes), and eliminated the need to pass return values (other than 'syntax error') back through the recursion.
Also a potential memory leak was fixed, and a small bug in the GUI code.
The code is currently very robust although quite naive - all the 'handlers' that are meant to 'resolve' the parsetree contents are currently being redirected to 'DefaultHandler'.

DefaultHandler's behavior is to recursively resolve each NonTerminal child token in a Reduction, and then replace said NonTerminal token with the 'exploded' token(s) which it was representing. This is what I previously called 'flattening a subtree' - we still do it. Now we're meant to resolve the 'flattened' sentential structure(s) at some point during the RETURN FROM RECURSION.

As an example, I'll describe the handling of the IF directive, where we get to implement 'conditional interpretation' by testing a condition before ever looking at the contents of the IF case (<Statements>).


<If Statement> ::= if <Cond> <Statements> endif


We have four tokens in this reduction.
The first and last are Terminals, so we can leave them alone.
But <Cond> and <Statements> are NonTerminals, they represent subtrees to be recursed, at the bottom of which we expect to find some Terminal(s) to be returned.
So we'll recurse the <Cond> subtree , as we process the tokens from left to right.

The <Cond> handler will recursively expand its tokens, and once it has done so, the Condition can be tested, noting that all of its tokens are now Terminals. Having tested the condition, the handler can replace the <Cond> tokens with simply TRUE or FALSE, and return to its caller.

At this point, we've returned to our original handler, and our Reduction might now look like this:

if <TRUE> <Statements> endif


Our current handler can now see that the condition was found to be TRUE, we might continue to process the <Statements> token, which is next in the left-to-right series.
If the condition had been FALSE, we might do nothing.

In either case, this 'if' node should destroy all its tokens before returning to its caller, because whether true or false, we have completely 'SOLVED' this node, and don't need to return any tokens to a parent node.


Attached is a revised grammar file, I'm not included to repost the other files yet, maybe later today.
This one can much better handle nested stuff (braced subexpressions of any kind), linearly dependant assign sequences (x=y=z) and other stuff. Some very primitive support for 32bit x86 asm is also there, but is not important yet.


Attachments:
Posted on 2010-04-27 19:10:17 by Homer
Very good news :) <Add_Exp> and <Assignment> are implemented: and the technique is working flawlessly :)
The interpreter can now solve math expressions that use named variables and/or integers, with Addition operator.

simple example:

x=10
y=1963
x=x+y


is interpreted perfectly:
-Variable x is defined as integer 10.
-Variable y is defined as integer 1963.
-Variable x is redefined as integer 1973.

I'll try to complete the Addition handler for the remaining literal data types (float, string), as I can then quickly spit out the functions for handling the remaining math operators (-, *, /)

Edit]
I've also implemented the <Assign Expr> , which is used for "subsequent assignments, and also to implement "assignment modifiers" (+=, -=, *=, /=) for example:

x = 10
y = x += 15 - x


In order to understand how the calculation is performed,
We can imagine that the Math Operators (+, -, *, /) and the Assign Operators (equ, =, +=, -=, *=, /=) are nested delimiters for math subexpressions... our parsetree structure encodes this 'nesting'.
If we add some braces to better show this, our example becomes:

y = ( x = (x + (15 - (x))))


Now we can see the exact order in which the calculation will be performed, which is dictated by the structure of our parsetree, which is dictated in turn by the constraints that our Grammar imposes apon the input plaintext being interpreted.

We don't have to figure out how to solve the equations, we just have to write handlers for the most primitive math operations of 2 operands and 1 operator, and then let the recursive solver make sense of it all for us, as it walks the parsetree structure.


Posted on 2010-04-28 07:51:46 by Homer
I guess next is to implement the remaining pure math operators, and then do some tests to see how nested braces mess up my existing code (they should not affect anything in theory, we'll see huh).
At that point, the Evaluator will be essentially complete, and I'll be ready to release a new demo, updated sourcecode, and decide what to implement next :)

I've left myself a lot of 'markers' in my sourcecode where missing stuff belongs, that makes it easier to pick my next target, I don't need to think so hard about what is possible if I have a todo list at hand.
But basically, I just need to implement more handlers.
Once I've completely implemented the existing grammar, I'll release a third demo/sourcecode, then it will be time for a formal discussion about general programming grammars (not just asm) - what we like, what we loathe, what we'd have if we could invent it, etc.

Posted on 2010-04-28 09:51:22 by Homer
So far, my grammar only defines 4 literal datatypes: integer, float, hex, and string.
The code responsible for handling Additions now supports all but hex, which I'll add in a few moments.

I have allowed additions of strings, but only with other strings.
Addition of floats with other types will result in a float.
Hex and Integer types are treated synonymously.

Although I'll be taking this opportunity to sweep the code for potential improvements, I believe I'm well on track to implementing a robust expression evaluator and solver ("interpreter"). I will have to be careful what I implement if I intend this to be made available in a generic yet still useful form.
In fact, as soon as the math stuff is in place, I'll probably call it a day on this class and then derive a class which implements more syntax-specific (language-specific) stuff.
I think the expression solver, even just doing maths, is useful as a standalone thing, just in case someone asks me for the millionth time how to write a calculator application :D Talk about overkill eh?

Some examples:

Addition of strings:
a = "hello " + "there, "
b = "Ralf"
c = a + b

Expressions with mixed datatypes:
q = 40
x = 15.704 - 16 * q


The only invalid kind of expression so far is:

Addition of String and Non-String is ILLEGAL !!!
a = "tom"
b = a + 27


About datatype conversions:
In light of that last example, its worth mentioning that we could allow indirect addition of strings and nonstrings, via some new directive such as CHR...
ie b = a + CHR(13) + CHR(10)
That would be perfectly reasonable under a "BASIC" kind of grammar.
And most languages support some variation on this, so its a good example.
Posted on 2010-04-29 03:45:00 by Homer
It was getting late, so I thought I'd tackle a relatively quick and easy one before bed..

I've implemented the <Negate Exp> handler.
This is used to negate the value of an immediate or variable within the context of a math expression.
This is NOT the mathematical Subtraction operation, although it uses the same symbol...
My parser can tell what we're trying to do based on the context it appears in.

Eg

x = -x + s
y = 7 + -J            "Z equals Seven  Plus Negated J"
z = 7 - -J             "Z equals Seven Minus Negated J"


<Negate Exp> appears quite late in the rules, its near the bottom of the <Expression> tree, just before <Value>.
This means that we can negate literals easily, but we need to use braces if we want to negate an entire expression.
eg  -(expression)
And I haven't implemented the 'brace stripper' yet... guess thats next, though I really do wanna complete the remaining math operators.


Posted on 2010-04-29 11:28:01 by Homer

Made some subtle but important changes.
It's now possible for a node handler to destroy all of its own tokens, and return to its caller "an empty reduction".
This allows me to 'clip solved children' from a parent node after expanding and before processing it.
It means that anything we completely dealt with is eliminated from the parsetree, and will not appear in the content of parent nodes, and so will not affect their handler. It is a very nice improvement, just hard to explain exactly how without ranting for pages on the ramifications.

Tonight will be implementing one of the remaining three math operators!

Posted on 2010-04-30 02:16:16 by Homer

Yaknow, everything goes better on Fridays.
I implemented Subtractions in 5 minutes flat.
It's actually NOT a new handler - the Add_Exp handler should handle +, -, &, | ( And , Or ) .
Yay! I can add, I can subtract, and with various datatypes.
Damned if this thing is not beginning to look like more than a whole bunch of debug strings :P
Posted on 2010-04-30 04:05:33 by Homer
Another 5 minutes to implement mul/div... Mult_Exp handler was implemented.
Now the interpreter can solve expressions with add/sub/mul/div for mixed datatypes.
The<Mult_Exp> handler looks pretty much like the <Add Exp> handler, but is a descendant/child node..
The Mult_Exp handler implements multiplication and division, it is located in the grammar just below the Add_Exp node, and just above the Negate_Exp node.
This means that multiplications and divisions are handled before additions and subtractions, which implements the correct mathematical operator precedence !!!
We get correct precedence for free, due to the structure of our grammar - I keep saying how important that structure is, now you can start to see how we benefit from a well structured grammar.


What else do I need? I will sure add POW, AND and OR tomorrow... (^, &, |)
And I'll need to handle braces..
Any math operators I forgot?
I think tomorrow I'll have a full algebraic solver, will be time to post a demo and source update :)

When you get things really RIGHT at the core of your implementation, everything just clicks into place, and its SO sweet!
Posted on 2010-04-30 09:27:50 by Homer
Having correctly implemented and tested with one math operator, it only took ten minutes to implement the other three major math operators - I wish all my projects were so linear in nature lol.
Speaking of linearity, I'm looking forward to implementing IF/ELSEIF/ELSE/ENDIF logic, but I'm not sure it belongs in this, which is meant to be a baseclass... I think I will take my own advice, and finish off the math stuff, and stop work on this class and derive a new one for further work.

Implemented:
=, +=, -=, /=, *=, +, - , /, *

Unimplemented: ^, ()

Almost there :D


Posted on 2010-04-30 09:36:12 by Homer

A new rule <PowExp Expr> was added to the grammar.
This rule implements Powers, eg 2^3, and the rule is located just below the multiplication/division node.
This means that Power operator takes precedence over Mul/Div operator, which is correct.
Floating point powers are supported (!), with the returned datatype recast on demand if necessary.

Eg
x = J^2
x = Q^0.5 <-- this is one way to do a sqrt


I have also introduced the 'engineering exponent' (e) suffix for any kind of Value.
But there's two caveats to using it, so I have not yet implemented it in the solver.
The first caveat is that positive exponents must NOT contain a + sign.
The second is that the 'e' needs to be clearly delimited from the preceding Value by whitespace.
This is the first real difficulty I've had, and it's clearly a problem that needs to be handled another way.... I know Braces would help to clarify to the parser what we intend, but this will do for now.

Eg
x = 1.0e2 will not work
x = 1.0 e 2 will work
x =  20e-2 will not work
x = 20 e-2 will work
x = y

So that really just leaves me braces, and whether I go ahead and implement the 'e' (... *10^n)
Pretty much I think I'm ready to post an updated demo :)

I actually found very little information on the internet regarding the details of implementing a parsetree solver, which was distressing. And any sourcecodes I was able to find were far more obscure and unintelligable than my own.
If anyone would like to ask questions about any part of this interpreter framework, please feel free :)

Posted on 2010-05-01 02:28:08 by Homer
Braces (nested subexpressions) have been implemented.
This was actually VERY easy.. Braced subexpressions are defined under the <Value> rule as follows:

<Value>       ::=   <Literal> ! Could be a simple Literal value (eg integer, string, float etc)
   | '(' <Expression>  ')' ! Could be a braced (sub)expression (ie nesting support for expressions)
   | ID '++' ! Could be Name++
   | ID '--' ! Cound be Name--
   | ID           ! Could be Name


See the second one down? It says that any Value in an Expression can contain an entire (Braced Expression).
When my interpreter reaches a Value node, I check for and eliminate braces, then pass whatever remains naively to the default handler for recursion. That's all! I don't need to deal with the subexpression, or even take any notice that I have one, rather than say, a Literal value... I just let the default handler deal with it. Done.

My test data was:
x= 15 * (2^(1+1))
y = 100^0.5

And the answers were 60 (integer) and 10.0 (float).
Correct.

Note in the second line, that the interpreter has decided that the answer should be a float.
If we operate on a floating point number and an integer (or hex) number, the result will always be a float.
Just remember that these are not real datatypes, these types are just telling us which numerical representation of a number was used. It is in the interest of consistency that the input datatypes should determine the output datatype.
Where possible, the datatype of the leftmost entity will be retained.
For example:
0x1 + 1 = 0x2
1 + 0x1 = 2

Posted on 2010-05-01 03:36:25 by Homer
Conditional operators (<,<=,==,>=,>) are implemented, aka the <Compare Expr> node.

This node attempts to resolve comparison expressions into a single BOOLEAN token.
Although, I have not explicitly defined BOOLEAN as a datatype - I am simply returning integer 0 or 1.
This is compatible with statements such as "repeat until 0".
We can define TRUE and FALSE at buildtime.

Since IF is not implemented, these operators remain practically useless.
I will implement IF logic in my derived class - but I felt the conditional operators themselves should be implemented in the Evaluator class.

Posted on 2010-05-01 06:15:11 by Homer
Attached is an update of relevant files.
I am tempted to add "trig" support to the Evaluator class, but that can wait.
For the time being, I will consider the Evaluator class to be 'reasonably complete'.
Of course that will prove not to be the case, right??

I will now turn my attention to a new derived class called MacroEngine.
Here is where I will implement any 'buildtime directives', which include the IF directive.
After that, I will be starting to exhaust my primitive experimental grammar.
I will be asking for your input to help design a nice clean programming syntax :)

This will also be the final post I make in this thread.
However, I will leave the thread unlocked for people to ask questions.

ok I just had to implement SHL/SHR/<</>> ... you'' have to get the next exciting update from the new thread (Macro Engine)

Attachments:
Posted on 2010-05-01 07:29:30 by Homer