FOOTNOTES:
LALR(k) Parsers
An LALR(k) parser is a finite-state automata that uses a pushdown stack and 'k' symbols of
lookahead to solve any ambiguity encountered in the input. For most computer languages, the
input can be handled with one symbol of lookahead and the parser operates as an LALR(1) parser.
However, for some computer languages, one lookahead symbol is not enough and
the parser will operate as an LALR(k) nondeterministic parser and use as many lookahead symbols
as necessary to resolve ambiguity. The maximum number of lookaheads allowed (k) is predefined
manually in the parser source code (typically 2, 3 or 4). For complex programming languages
the number of nondeterministic states is usually small, so the impact on speed is minimal.
DFA Lexers
A DFA lexer is a finite-state automata without a pushdown stack (i.e. not a PDA).
DFA is the fastest recognition algorithm (5 times the speed of PDA's). A DFA algorithm does
not use any lookahead characters. Because it is not a PDA, it cannot handle "nested" comments
or any type of "nested" input (e.g. [[a]]). However, DFA's work fine for most programming languages.
Afterall, the job of handling a "nested" type of input usually belongs to the parser. Before DFA's,
lexical analysis had been known to consume a large percentage of the processing time in compiler front-ends.
But with DFA's that time is reduced to about 15-20% of the compiler front-end time.
Matrix Tables
Matrix tables provide the fastest table-driven parsers and lexers. LRSTAR and LRSTAR use compressed-matrix
tables, for their finite-state automata, which offer small tables but still very fast. Matrix tables compile
faster than recursive-descent parsers and direct-code lexers. LRSTAR testing shows
that matrix tables can be as fast as direct-code lexers, but are smaller and compile faster.
Fast parsing research shows that
recursive-descent parsers can be faster than table-driven parsers. RD parsers have some disadvantages,
however some people prefer RD parsers.
TBNF Grammar Notation
TBNF is a grammar notation that can be used to define the structure and contents of an abstract-syntax
tree (AST) and the generated parser will build an AST automatically. TBNF goes beyond BNF and EBNF.
It makes the task of building compilers and translators easier, quicker and more reliable. For more
information, see the TBNF Paper.
LBNF Regular Expressions
LBNF is a Lexical BNF notation plus regular expressions for defining the symbols (or tokens) of a computer
language. It has the potential to be more powerful, by defining a push-down automata (PDA), which regular
expressions cannot do. A future version of LRSTAR may be able to generate PDA lexers. Here is the
lexical grammar for C c.lgr.txt.
Skeleton Files
Skeleton files are parser and lexer source code which includes special notation telling LRSTAR what code to
generate and how to format the parser and lexer tables. This skeleton-file concept makes it possible to
have LRSTAR output source code in many different programming languages. This way you are not locked
into one language or one style of coding. You can customize the code to fit your preferred programming
language and style of coding.
Compiler Front-End Speed
The compiler front-end speed was measured by "compiling" an input file of C source code of size 2200 lines.
The input file was read 1000 times, so the "compiling" process was done 1000 times: opening the file,
parsing it, building a symbol table and building an AST in memory. No output files were created. No
preprocessing was done. The testing computer is a 3 GHz desktop single-CPU machine.