Created
January 14, 2016 20:49
-
-
Save srcreigh/522bfd4ea971612cf62e to your computer and use it in GitHub Desktop.
notes from my compiler construction course
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
January 14, 2016 | |
Thursday | |
COMPILER CONSTRUCTION | |
1. TOKENS AND KINDS | |
=================== | |
|------|-------------| | |
| kind | RE | | |
|======|=============| | |
| IF | if | | |
| ID | [a-z][a-z]* | | |
|------|-------------| | |
figure 1.1: examples of kinds and the regular expressions for the kinds | |
2. CONTEXT-FREE GRAMMARS | |
======================== | |
2.1 DEFINITIONS | |
--------------- | |
- a context-free grammar is a 4-tuple G = <N,T,R,S> where | |
- set of terminals T | |
- set of non-terminals N | |
- start symbol S ε N | |
- production rules R ⊆ N x V* | |
- symbols are V = T ∪ N | |
- sequences of symbols V* | |
- sequences of terminals T* | |
2.2 CONVENTIONS | |
--------------- | |
- a, b, c ε T | |
- lowercase letters at the start of the alphabet | |
- A, B, C, S ε N | |
- uppercase letters at the start of the alphabet (and S, the start symbol) | |
- A → α ε R | |
- rules have arrows → | |
- X, Y, Z ε V | |
- uppercase letters at the end of the alphabet are symbols (either terminal or | |
non-terminal) | |
- α, β, γ ε V* | |
- lowercase greek letters are sequences of symbols (terminal or non-terminal) | |
- x, y, z ε T* | |
- lowercase letters at the end of the alphabet are sequences of terminal | |
symbols | |
2.3 DERIVING A LANGUAGE FROM A GRAMMAR | |
-------------------------------------- | |
- βAγ ⇒ βαγ if A → α ε R | |
- "⇒" is pronounced "directly derives" | |
- α ⇒* β if α = β or (a ⇒ γ and γ ⇒* β) | |
- "⇒*" is pronounced "derives" | |
- L(G) = { x ε T* | S ⇒* x } | |
! the set of sequences of non-terminals that can be derived from the start | |
symbol | |
2.4 PARSING | |
----------- | |
- want to know for a given x whether x ε L(G) | |
- a "proof" is a derivation of x | |
- or a parse tree (which is just a graphical representation of a derivation) | |
- in a right derivation, each step is of the form βAx ⇒ βαx | |
- expands the rightmost non-terminal | |
- similarly, left derivations are such that each step is of the form xAγ ⇒ xαγ | |
- parse trees, left derivations, and right derivations are all isomorphic | |
- this is by design | |
- we want programs to have unambiguous meanings | |
? we *could* derive rules from the middle, but then ...? the parse trees | |
wouldn't be ambiguous? | |
- a grammar is ambiguous if some word has more than one parse tree (or | |
left-derivation or right-derivation) | |
E → F | F - F | |
F → ID | NUM | |
figure 2.4.1: rules | |
F - F | |
/ \ | |
F F | |
| | | |
NUM - NUM | |
figure 2.4.2: parse tree for "NUM - NUM" | |
E E | |
⇒ F - F ⇒ F - F | |
⇒ NUM - F ⇒ F - NUM | |
⇒ NUM - NUM ⇒ NUM - NUM | |
figure 2.4.3: left and right (respectively) derivations for "NUM - NUM" | |
E → F | E - E | |
F → ID | NUM | |
figure 2.4.4: updated rules (notice the E → E - E rule) | |
|------------------------------------| | |
| E | | |
| /|\ -----E | | |
| / | \ / / \ | | |
| / | E E | \ | | |
| / / /|\ /|\ | | | | |
| / / / | \ / | \ | | | | |
| E | E | E E | E | E | | |
| | | | | | | | | | | | | |
| F | F | F F | F | F | | |
| | | | | | | | | | | | | |
| NUM - NUM - NUM NUM - NUM - NUM | | |
|------------------------------------| | |
figure 2.4.5: two separate parse trees for expression "NUM - NUM - NUM" | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment