LRSTAR
PARSER / LEXER / COMPILER GENERATOR FOR C++
AMDG
INTRODUCTION Home Grammars Comparison Feedback From Users PRODUCTS PARSERS LEXERS Release Information THEORY Definition Of Terms Theoretical Information COMPANY About Donate Contact


SourceForge Page

LRSTAR 4.0  Parser & Lexer Generator

DOCUMENTATION VERSION
LRSTAR User's Manual 4.0.042
LRSTAR Skeleton Notation 4.0.025
Definition Of Terms
TBNF Paper

DOWNLOAD
LRSTAR 4.0.44 System Contains everything

SOURCE CODE & GRAMMARS
  • Source code license
Open Source Free BSD
  • Source code compiles with
GCC, Visual C++
  • Includes 6 sample working projects
PARSER GENERATOR DETAILS
  • Creates parser tables of type
Matrix
850,000  lines/second
yes
  • Shows traces for conflicts in your grammar
yes
  • Parsers have automatic error recovery
no
  • Parsers have a symbol-table builder
yes
  • Parsers do AST construction and traversal
yes
LEXER GENERATOR DETAILS
  • Creates lexer tables of type
Matrix
  • Lexer speed (in memory, no I/O)
30,000,000  tokens/second
yes
  • Lexers do keyword recognition
yes
  • Unicode support
no



FOOTNOTES:
 

LALR(k) Parsers 

An LALR(k) parser is a finite-state automata that uses a pushdown stack and 'k' symbols of lookahead to solve any ambiguity encountered in the input.  For most computer languages, the input can be handled with one symbol of lookahead and the parser operates as an LALR(1) parser.  However, for some computer languages, one lookahead symbol is not enough and the parser will operate as an LALR(k) nondeterministic parser and use as many lookahead symbols as necessary to resolve ambiguity.  The maximum number of lookaheads allowed (k) is predefined manually in the parser source code (typically 2, 3 or 4).  For complex programming languages the number of nondeterministic states is usually small, so the impact on speed is minimal.

DFA Lexers 

A DFA lexer is a finite-state automata without a pushdown stack (i.e. not a PDA).  DFA is the fastest recognition algorithm (5 times the speed of PDA's).  A DFA algorithm does not use any lookahead characters.  Because it is not a PDA, it cannot handle "nested" comments or any type of "nested" input (e.g. [[a]]).  However, DFA's work fine for most programming languages.  Afterall, the job of handling a "nested" type of input usually belongs to the parser.  Before DFA's, lexical analysis had been known to consume a large percentage of the processing time in compiler front-ends.  But with DFA's that time is reduced to about 15-20% of the compiler front-end time.

Matrix Tables 

Matrix tables provide the fastest table-driven parsers and lexers.  LRSTAR and LRSTAR use compressed-matrix tables, for their finite-state automata, which offer small tables but still very fast.  Matrix tables compile faster than recursive-descent parsers and direct-code lexers.  LRSTAR testing shows that matrix tables can be as fast as direct-code lexers, but are smaller and compile faster.  Fast parsing research shows that recursive-descent parsers can be faster than table-driven parsers.  RD parsers have some disadvantages, however some people prefer RD parsers.

TBNF Grammar Notation 

TBNF is a grammar notation that can be used to define the structure and contents of an abstract-syntax tree (AST) and the generated parser will build an AST automatically.  TBNF goes beyond BNF and EBNF.  It makes the task of building compilers and translators easier, quicker and more reliable.  For more information, see the TBNF Paper.

LBNF Regular Expressions 

LBNF is a Lexical BNF notation plus regular expressions for defining the symbols (or tokens) of a computer language.  It has the potential to be more powerful, by defining a push-down automata (PDA), which regular expressions cannot do.  A future version of LRSTAR may be able to generate PDA lexers.  Here is the lexical grammar for C   c.lgr.txt.

Skeleton Files 

Skeleton files are parser and lexer source code which includes special notation telling LRSTAR what code to generate and how to format the parser and lexer tables.  This skeleton-file concept makes it possible to have LRSTAR output source code in many different programming languages.  This way you are not locked into one language or one style of coding.  You can customize the code to fit your preferred programming language and style of coding.

Compiler Front-End Speed 

The compiler front-end speed was measured by "compiling" an input file of C source code of size 2200 lines.  The input file was read 1000 times, so the "compiling" process was done 1000 times: opening the file, parsing it, building a symbol table and building an AST in memory.  No output files were created.  No preprocessing was done.  The testing computer is a 3 GHz desktop single-CPU machine. 

© Copyright Paul Mann 2013.  All rights reserved.