Description

2. Where Are We?. Source code: if (b==0) a =

Transcripts

Best Down Parsing I CS 471 September 12, 2007

Where Are We? Source code: if (b==0) a = "Hello there"; Token Stream: if ( b == 0 ) a = "Howdy" ; Abstract Syntax Tree (AST) Lexical Analysis Syntactic Analysis if Semantic Analysis == ; = b 0 a "Hey"

Last Time Parsing outline sans context linguistic uses Formal portrayal of dialect language structure Deriving strings utilizing CFG Depicting determination as a parse tree

Grammar Issues Often: more than one approach to infer a string Why is this an issue? Parsing: is string an individual from L(G)? We need more than a yes or no answer Key: Represent the induction as a parse tree We need the structure of the parse tree to catch the significance of the sentence

Grammar Issues Often: more than one approach to determine a string Why is this an issue? Parsing: is string an individual from L(G)? We need more than a yes or no answer Key: Represent the inference as a parse tree We need the structure of the parse tree to catch the significance of the sentence

Parse tree expr operation expr operation expr * y x - 2 Parse Tree: x – 2 * y Right-most determination

expr * expr operation expr - y expr operation expr * y x 2 x - 2 Abstract Syntax Tree Parse tree contains additional garbage Eliminate middle of the road hubs Move administrators up to parent hubs Result: conceptual grammar tree

Left versus Right Derivations Two deductions of " x – 2 * y " Left-most induction Right-most determination

- * x * - y 2 y x 2 Derivations One catches meaning, the other doesn\'t Left-most inference Right-most inference

With Precedence Last time: approaches to constrain the correct tree shape Add creations to speak to priority

expr - expr - term * expr term truth actuality term * reality certainty actuality y expr operation expr x 2 expr operation expr * y x - 2 With Precedence

Parsing What is parsing? Finding the induction of a string: If one exists Harder than creating strings Two noteworthy methodologies Top-down parsing Bottom-up parsing Don\'t chip away at all setting free linguistic uses Properties of sentence structure decide parse-capacity Our objective : make parsing proficient We might have the capacity to change a punctuation

Two Approaches Top-down parsers LL(1), recursive plummet Start at the foundation of the parse tree and develop toward leaves Pick a generation & attempt to coordinate the information Bad "pick" may need to backtrack Bottom-up parsers LR(1), administrator priority Start at the leaves and develop toward root As information is expended, encode conceivable parse trees in an inner state Bottom-up parsers handle a substantial class of language structures

Grammars and Parsers LL(1) parsers L eft-to-right info L eftmost inference 1 image of look-ahead LR(1) parsers L eft-to-right info R ightmost deduction 1 image of look-ahead Also: LL(k), LR(k), SLR, LALR, … Grammars this can deal with are called LL(1) syntaxes Grammars this can deal with are called LR(1) syntaxes

Top-Down Parsing Start with the base of the parse tree Root of the tree: hub named with the begin image Algorithm: Repeat until the edge of the parse tree matches input string At a hub A, select a generation for An Add a youngster hub for every image on rhs If a terminal image is included that doesn\'t coordinate, backtrack Find the following hub to be extended (a non-terminal) Done when: Leaves of parse tree coordinate info string (achievement) All preparations depleted in backtracking (disappointment)

Example Expression linguistic use (with priority) Input string x – 2 * y

Current position in the information stream Example Problem : Can\'t coordinate next terminal We speculated wrong at step 2 expr x - 2 * y x - 2 * y 2 expr + term x – 2 * y 3 term + term expr + term x – 2 * y 6 figure + term x – 2 * y 8 <id> + term x – 2 * y - <id,x> + term actuality x

Backtracking Rollback creations Choose an alternate generation for expr Continue x - 2 * y x - 2 * y 2 expr + term x – 2 * y Undo every one of these creations 3 term + term x – 2 * y 6 consider + term x – 2 * y 8 <id> + term x – 2 * y ? <id,x> + term

Retrying Problem : More contribution to peruse Another reason for backtracking expr x - 2 * y x - 2 * y 2 expr - term expr - term x – 2 * y 3 term - term x – 2 * y 6 consider - term x – 2 * y 8 <id> - term certainty x – 2 * y - <id,x> - term x – 2 * y 3 <id,x> - figure x – 2 * y reality 2 7 <id,x> - <num> x

term * actuality truth y 2 Successful Parse All terminals coordinate – we\'re done expr x - 2 * y x - 2 * y 2 expr - term expr - term x – 2 * y 3 term - term x – 2 * y 6 calculate - term x – 2 * y 8 <id> - term x – 2 * y - <id,x> - term x – 2 * y 4 <id,x> - term * actuality x – 2 * y reality 6 <id,x> - certainty * truth x – 2 * y 7 <id,x> - <num> * actuality x – 2 * y - <id,x> - <num,2> * certainty x – 2 * y 8 <id,x> - <num,2> * <id>

Other Possible Parses Problem: end Wrong decision prompts to unbounded extension (More vitally: without devouring any information!) May not be as evident as this Our punctuation is left recursive x - 2 * y x - 2 * y 2 expr + term x – 2 * y 2 expr + term + term x – 2 * y 2 expr + term + term + term x – 2 * y 2 expr + term + term + term + term

Left Recursion Formally, A language structure is left recursive if a non-terminal A to such an extent that A → * An a (for some arrangement of images a ) Bad news: Top-down parsers can\'t deal with left recursion Good news: We can deliberately take out left recursion What does →* mean? A → B x B → A y

Notation Non-terminals Capital letter: A, B, C Terminals Lowercase, underline: x , y , z Some blend of terminals and non-terminals Greek letters: a, b, g Example: a = + x

New non-terminal Eliminating Left Recursion Consider this linguistic use: Rewrite as Language is b trailed by at least zero a This creation gives you one b These two preparations give you at least zero a

Back to expressions Two instances of left recursion: Transform as takes after:

Eliminating Left Recursion Resulting punctuation All privilege recursive Retain unique dialect and associativity Not as instinctive to peruse Top-down parser Will dependably end May even now backtrack

Top-Down Parsers Problem : Left-recursion Solution : Technique to evacuate it What about backtracking? Current calculation is beast drive Problem : how to pick the correct generation? Thought: utilize the following info token How? Take a gander at our privilege recursive sentence structure…

Two creations with no decision at All different preparations are particularly recognized by a terminal image toward the begin of RHS Right-Recursive Grammar We can pick the correct generation by taking a gander at the following information image This is called lookahead BUT, this can be dubious…

Lookahead Goal: abstain from backtracking Look at future information images Use additional setting to settle on right decision How much lookahead is required? When all is said in done, a discretionary sum is required for the full class of setting free sentence structures Use favor calculation CYK calculation, O(n 3 ) Fortunately, Many CFGs can be parsed with constrained lookahead Covers most programming dialects

Top-Down Parsing Goal : Given creations A → a | b , the parser ought to have the capacity to pick amongst an and b How can the following information token help us choose? Arrangement : FIRST sets Informally: FIRST ( a ) is the arrangement of tokens that could show up as the primary image in a string got from a Def: x in FIRST ( a ) iff a → * x g

Top-Down Parsing Building FIRST sets We\'ll take a gander at this calculation later The LL(1) property Given A → an and A → b , we might want: FIRST ( a ) FIRST ( b ) = Parser can settle on right decision by taking a gander at one lookahead token ..nearly..

Top-Down Parsing What about e creations? Convolutes the meaning of LL(1) Consider A → an and A → b and a might be void For this situation there is no image to distinguish an Example: What is FIRST(3)? = { } What lookahead image reveals to us we are coordinating creation 3?

Top-Down Parsing If A was unfilled What will the following image be? Must be one of the images that promptly take after an A Solution Build a FOLLOW set for every generation with e Extra condition for LL: FIRST ( b ) must be disjoint from FIRST ( a ) and FOLLOW ( A )

FOLLOW Sets Example: F IRST (1) = { x } F IRST (2) = { y } F IRST (3) = { } What can take after A? Take a gander at the setting of all employments of A F OLLOW (A) = { z } Now we can extraordinarily distinguish every generation: If we are attempting to coordinate An and the following token is z , then we coordinated creation 3

More on FIRST and FOLLOW Notice: FIRST and FOLLOW might be sets FIRST may contain notwithstanding different images Example: FIRST(1) = { x , y , } FOLLOW(A) = { z , w } Question: When might we think about FOLLOW(A)? Answer :

LL(1) Property Including e preparations FOLLOW (A) = the arrangement of terminal images that can promptly take after A Def : FIRST +(A → a ) as FIRST ( a ) U FOLLOW (An), if e FIRST ( a ) FIRST ( a ), generally Def : a syntax is LL(1) iff A → an and A → b and FIRST +(A �