I have been coding a small scanner generator, and thought it might be of interest to someone here (I recall some discussion on how to read and parse scripts).

A scanner is a program which uses table lookups to go through a file/buffer and figure out what tokens it contains .. they are used in most compilers and many scripting languages because they are fast and (relativaly) easy to maintain.

The zip contains a program which reads a specification file and produces the scanner tables (like flex but simpler), and some assembler code (NASM style) which uses the tables to scan a buffer.

http://home19.inet.tele.dk/jibz/files/jasg20020722.zip
Posted on 2002-07-22 07:05:02 by Jibz
Jibz,

Thanks for posting this code, when I have a little more brain left and less to do I will do my best to digest how it works properly as the idea interests me and will be useful in the future.

Regards,

hutch@movsd.com
Posted on 2002-07-22 09:13:18 by hutch--
I fixed a small bug in the jasg code.

http://home19.inet.tele.dk/jibz/files/jasg20020726.zip
Posted on 2002-07-26 07:49:07 by Jibz
This is very good algorithm I guess.I converted to masm syntax and tried but I saw one weird output.AFAIK new line means 0dh,0ah.Parser first recognize newline(3) and next it says unknown (4).Also for example what I parse have zeros a lot.How can I start from where it ended.If script is like var = xxx it reads as there are three elements var,=,xxx.In the readme you say it recognize s spaces but in my try it didnt worked.Also parser recognized 12h as newline.I guess it is text based script parser :(

jasg output is nasm oriented maybe you can add masm compability option insted of
nlex_base:
dw ...

it should be
nlex_base dw ...

Thanks for this great source code.
Posted on 2002-07-26 13:07:12 by LaptoniC

AFAIK new line means 0dh,0ah. Parser first recognize newline(3) and next it says unknown (4).


On dos/windows newline is 0dh,0ah .. it's different on mac or unices. But it's not difficult to fix -- 0ah should just be added to the whitespaces.

Also for example what I parse have zeros a lot. How can I start from where it ended.


When it finds a zero-terminator and returns 0, the lex_ptr variable points to where it was. So you can use that to continue after the zero-terminator.

If script is like var = xxx it reads as there are three elements var,=,xxx. In the readme you say it recognize s spaces but in my try it didnt worked.


At the end of SBC_lex, there are a few tests to make it continue on whitespace (as they are ignored except for serving as separators in most languages) -- you can just change them so SBC_lex returns on whitespace instead.

Also parser recognized 12h as newline.I guess it is text based script parser :( jasg output is nasm oriented maybe you can add masm compability option.


Strange .. could you post (or e-mail me) an example of this?

It was written to be text-oriented, because most programming languages are written in text ;-)
Posted on 2002-07-29 04:24:40 by Jibz
Strange .. could you post (or e-mail me) an example of this?

in my example change szScript to


szScript db 'var=1',0dh,0ah,[B]12h[/B],'load("myfile")',0dh,0ah,'run stop',0dh,0ah,0

I wonder if this parser can be used to parse anything ie binary datas.What I parse have whitespaces new lines which I dont care.I mean if you write general parser it can help a lot
Keep up the good work :alright:
Posted on 2002-07-29 10:06:55 by LaptoniC


in my example change szScript to


szScript db 'var=1',0dh,0ah,[B]12h[/B],'load("myfile")',0dh,0ah,'run stop',0dh,0ah,0


Oh .. guess I should check the documentation better when I update -- I changed it so newline returns 4 and unknown returns 3 :-)

I wonder if this parser can be used to parse anything ie binary datas.What I parse have whitespaces new lines which I dont care. I mean if you write general parser it can help a lot


I guess the technique of using tables could be applied to any kind of parsing.

I updated the documentation, added a command line option for masm compatible output, and added formfeed and carriage return to the whitespace class.

http://home19.inet.tele.dk/jibz/files/jasg20020729.zip
Posted on 2002-07-29 15:12:33 by Jibz
I found a serious bug in the scanner code, which could cause it not to recognize tokens when using '-s'. Should be fixed now :-)

http://home19.inet.tele.dk/jibz/files/jasg20020730.zip
Posted on 2002-07-30 01:41:48 by Jibz
Jibz, this is a very powerful building block for little languages.
Thank you very much for sharing.
Posted on 2002-07-30 23:18:04 by bitRAKE