# Context Free Grammars

A **context-free grammar** (CFG) is a precise description of
language syntax. Each **rule** or **production** of the
grammar defines an interpretation for the named symbol on the left
side of the rule as a sequence of symbols on the right side of the
rule. The right side can be a combination of
**nonterminals** (themselves defined by rules) or
**terminal** symbols defined simply as strings, such as "the",
"a", "cat", "milk", and "drank".

**Parsing** a text string \(S\) according to a given context-free
grammar \(G\) is the algorithm problem of constructing a
**parse tree** of rule substitutions defining \(S\) as a single
nonterminal symbol of \(G\). One of the nonterminals, usually the
one on the left-hand side of the first production, is called the
**start symbol**. It names the construct defined by the overall
grammar.

## Example

Sentence ::= NounPhrase VerbPhrase NounPhrase ::= Article Noun VerbPhrase ::= Verb NounPhrase Article ::= the, a Noun ::= cat, milk Verb ::= drank

## Chomsky Normal Form

**Chomsky normal form** (CNF) requires the right side of
every rule to consist of

- exactly two nonterminals, i.e. \(X -> YZ\), or
- exactly one terminal symbol, i.e. \(X -> \alpha\).

Any context-free grammar can be easily and mechanically transformed into Chomsky normal form by repeatedly shortening long right-hand sides at the cost of adding extra nonterminals and productions. Thus, there is no loss of generality with this assumption.

## Bottom-Up Parsing

The **CYK algorithm** efficiently parses a string using
dynamic programming. The algorithm only works on grammars in
Chomsky normal form. This section is taken from
*The Algorithm Design Manual* by Steven S. Skiena.

Define \(M[i, j, X]\) to be a boolean function that is true iff substring \(S[i, j]\) is generated by nonterminal \(X\). This is true if there exists a production \(X -> YZ\) and breaking point \(k\) between \(i\) and \(j\) such that the left part generates \(Y\) and the right part \(Z\). In other words,

\begin{equation} M[i, j, X] = \bigvee\limits_{(X -> YZ) \in G} \Big( \bigvee\limits_{k=i}^j M[i, k, Y] \wedge M[k + 1, j, Z] \Big) \end{equation}
Where \(\vee\) denotes the logical *or* over all
productions and split positions, and \(\wedge\) denotes the
logical *and* of two boolean values.

The one-character terminal symbols define the boundary conditions of the recurrence. In particular, \(M[i, i, X]\) is true iff there exists a production \(X -> \alpha\) such that \(S[i] = \alpha\).

### TODO Example

### TODO Implementation

## TODO Top-Down Parsing

TODO: LL parser, LL(1)