LR(*) PARSER GENERATOR SYSTEM FOR C++
|INTRODUCTION Home Grammars Comparison Feedback PRODUCTS LrStar YaccStar LexStar Release Notes THEORY Terminology Papers COMPANY About Donate Contact LINKS SourceForge Page||
An LR(*) parser is a finite-state automata that uses a pushdown stack and 'k' symbols of lookahead to solve any ambiguity encountered in the input. For most computer languages, the input can be handled with one symbol of lookahead and the parser operates as an LALR(1) parser. However, for some computer languages, one lookahead symbol is not enough and the parser will operate as an LR(*) nondeterministic parser and use as many lookahead symbols as necessary to resolve ambiguity. The maximum number of lookaheads allowed (k) is predefined manually in the parser source code (typically 2, 3 or 4). For complex programming languages the number of nondeterministic states is usually small, so the impact on speed is minimal.
a DFA lexer is a finite-state automata without a pushdown stack (i.e. not a PDA). DFA is the fastest recognition algorithm (5 times the speed of PDA's). a DFA algorithm does not use any lookahead characters. Because it is not a PDA, it cannot handle "nested" comments or any type of "nested" input (e.g. [[a]]). However, DFA's work fine for most programming languages. Afterall, the job of handling a "nested" type of input usually belongs to the parser. Before DFA's, lexical analysis had been known to consume a large percentage of the processing time in compiler front-ends. But with DFA's that time is reduced to about 15-20% of the compiler front-end time.
Matrix tables provide the fastest table-driven parsers and lexers. LrStar and LexStar use compressed-matrix tables, for their finite-state automata, which offer small tables but still very fast. Matrix tables compile faster than recursive-descent parsers and direct-code lexers. LrStar testing shows that matrix tables can be as fast as direct-code lexers, but are smaller and compile faster. Fast parsing research shows that recursive-descent parsers can be faster than table-driven parsers. RD parsers have some disadvantages, however some people prefer RD parsers.
TBNF Grammar Notation
TBNF is a grammar notation that can be used to define the structure and contents of an abstract-syntax tree (AST) and the generated parser will build an AST automatically. TBNF goes beyond BNF and EBNF. It makes the task of building compilers and translators easier, quicker and more reliable. For more information, see the TBNF Paper.
LBNF Regular Expressions
LBNF is a Lexical BNF notation plus regular expressions for defining the symbols (or tokens) of a computer language. It has the potential to be more powerful, by defining a push-down automata (PDA), which regular expressions cannot do. a future version of LrStar may be able to generate PDA lexers. Here is the lexical grammar for C c.lgr.txt.
Skeleton files are parser and lexer source code which includes special notation telling LrStar what code to generate and how to format the parser and lexer tables. This skeleton-file concept makes it possible to have LrStar output source code in many different programming languages. This way you are not locked into one language or one style of coding. You can customize the code to fit your preferred programming language and style of coding.
Compiler Front-End Speed
The compiler front-end speed was measured by "compiling" an input file of C source code of size 2200 lines. The input file was read 1000 times, so the "compiling" process was done 1000 times: opening the file, parsing it, building a symbol table and building an AST in memory. No output files were created. No preprocessing was done. The testing computer is a 3 GHz desktop single-CPU machine.
|(c) Copyright Paul Mann 2013. All rights reserved.|