Top-Down Parsing

home1 home2
 Bib
 Algorithms
 Bioinfo
 FP
 Logic
 MML
 Prog.Lang
and the
 Book

Prog-Langs
 glossary
 Grammar
  Arith-Exp
  Abstract-syn
  Top-Down
  Bottom-Up

Derivations

e.g.,
<Exp> lm=> <Exp> + <Term> lm=> <Term> + <Term> lm=> <Term>*<Factor> + <Term> lm=> <Factor>*<Factor> + <Term> lm=> w*<Factor> + <Term> lm=> w*x + <Term> lm=> w*x + <Term>*<Factor> lm=> w*x + <Factor>*<Factor> lm=> w*x + y*<Factor> lm=> w*x+y*z
 
recall that `lm' indicates left-most derivation.
(left-most)
derivations
 
 Exp<end>
1Exp+Term
2Term
3Term*Factor
4Factor
5w
6x
7Term*Factor
8Factor
9y
10z
sentence:w * x + y * z <end>

Parse tree

  +
 
*
  *
w   x   y   z

Left-recursion

The grammar for expressions that shows the correct binding and associativity of operators is left-recursive and is unsuitable for use in a practical top-down parser which could go into an infinite loop without reading any input.

<Exp> ::= <Exp> + <Term> |
<Exp> - <Term> |
<Term>

<Term> ::= <Term> * <Factor> |
<Term> / <Factor> |
<Factor>

<Factor> ::= x | y | ... |
( <Exp> ) |
- <Factor>

A grammar for the concrete syntax of simple arithmetic expressions

The left-recursion is being used to express (i) that + (and -, *, /) are left-associative, e.g., a-b-c=(a-b)-c, and (ii) that an <Exp> is a sequence of one or more <Term>s separated by + | -. Similarly for <Term>, <Factor>s and * | /.

Left-recursion removal

Repeated sequences can be expressed by right-recursion as well as they can by left-recursion.

<Exp> ::= <Term> <Exp'> |
<Term> <Exp''>
<Exp'> ::= + <Exp> | ε
<Exp''> ::= - <Exp> | ε

<Term> ::= <Factor> <Term'> |
<Factor> <Term''>
<Term'> ::= * <Term> | ε
<Term''> ::= / <Term> | ε

<Factor> ::= x | y | ... |
( <Exp> ) |
- <Factor>

A non-left-recursive grammar for the concrete syntax of simple arithmetic expressions

But beware: Building a parse tree naively with this grammar could suggest that a-b-c = a-(b-c) which is not the case.

Factoring

The previous grammar can be used in a top-down parser but only in one that must look arbitrarily far ahead, e.g., past the <Term>, to determine which alternative to use for <Exp>. This difficulty can be removed by factoring productions with a common prefix.

<Exp> ::= <Term> <Exp'>
<Exp'> ::= + <Exp> | - <Exp> | ε

<Term> ::= <Factor> <Term'>
<Term'> ::= * <Term> | / <Term> | ε

<Factor> ::= x | y | ... |
( <Exp> ) |
- <Factor>

An LL(1) grammar (factored, non-left-recursive) for the concrete syntax of simple arithmetic expressions

(Care is still needed when building a parse tree with this grammar.) The grammar can be used in a top-down parser that uses just one symbol lookahead, in an LL(1) parser.

Some recursion cannot be removed from the grammar. Note that an expression can contain a subexpression in parentheses, <Exp> =>+ ( <Exp> ). This is not left-recursive (nor right-recursive) because some input (of a '(') is necessary before the recursion so this does not cause any problems.

-- 2002 LA, CSSE, Monash
Coding Ockham's Razor, L. Allison, Springer

A Practical Introduction to Denotational Semantics, L. Allison, CUP

Linux
 Ubuntu
free op. sys.
OpenOffice
free office suite
The GIMP
~ free photoshop
Firefox
web browser

key
::= "can be replaced by"
->
| "or by"
=> derivation
lm left most
rm right most

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Thursday, 28-Mar-2024 22:34:27 AEDT.