Has anyone come across any decent information for text parsing for syntax and key words?

Since compilers do this in a couple of passes, I thought this would be a good place to ask the question.  What I'm fishing for here is some sort of structured approach that is easily expandable for new syntax.  Otherwise I would simply write something like IF 'A' and 'B' and 'C' then syntax#1.

Posted on 2006-08-10 18:46:36 by NaN
Hi Nan,

The 2 main methods used are BNF and fully parenthesized.

Most modern compilers use the BNF recursive (stack based) algorithm or some form of it. Most parse engines like YACC, BISON, etc. generate some form of BNF.

There are many "hits" for these two terms on Google, but many of the BNF explanations use the term "fully parenthesied" either loosely or in a different context than I am using it. I found one book on the fully parenthesied method over 10 years ago, but cannot remember the name of it. There are tons of books and articles on the BNF method.

Fully parenthesized is an older form but has several advantages including better determination of where a syntax error has occurred in an expression. It is generally easier to generate code from this type of parser. It is mainly out of vogue because is is not inherently recursive.

Most modern compilers do this in a single pass generating asm code for a backend assembler.

Posted on 2006-08-11 01:07:13 by msmith
Source code -> abstract syntax tree -> optimizers (generic as well as arch-dependant) -> direct object file output without use of intermediate assembly step.
Posted on 2006-08-11 04:30:18 by f0dder
Thanks.. That pointed me in the right direction for enough info to work with.

Posted on 2006-08-11 21:48:58 by NaN
I got one of those books on how compilers work, Designing Compilers in C, or something like that.  They have a section on just this topic.  You'll probably find the book at the library.  It's easier to read than the Aho book.
Posted on 2006-08-11 22:30:00 by drhowarddrfine