Write a program in Java: Parser Generator: In this project option, you are going
ID: 3828043 • Letter: W
Question
Write a program in Java:
Parser Generator: In this project option, you are going to implement lexical/syntax analysis using Stack.
Syntax-Directed Translations
The front end of the compiler constructs a intermediate representation of the source program from which the back end generates the target program. A syntax directed translation scheme is a syntax directed definition in which the net effect of semantic actions is to print out a translation of the input to a desired output form.
Stack
Stack is a LIFO (last in first out) storage with two abstract operations : push, pop. Push will put an item into stack at the top. Pop retrieve an item at the top of stack.
Calculations using stack
Because a stack is LIFO, any operation must access data item from the top. Stack doesn't need 'addressing' as it is implicit in the operators which use stack. Any expression can be transformed into a postfix order and stack can be use to evaluate that expression without the need for explicitly locate any variable. For example:
B + C - D ==>
B C + D - (post fix)
Intermediate Code:
push rvalue B
push rvalue C
add
push rvalue D
sub
Instruction set of stack
1. Instructions fall into three category:
a. Integer arithmetic: add, mul, div ...
b. Stack manipulation:
b.i. Push v: push v onto stack
b.ii. push lvalue L: push address of data location L
b.iii. push rvalue L: push contents of data location L
b.iv. pop: throw away the value on stack top
b.v. := : the rvalue on the top is placed in the lvalue below it and both are popped.
b.vi. copy: push a copy of top value on the stack.
c. Control flow:
c.i. label L: taget of jump to L
c.ii. goto L: nect instruction is taken from statement with label L
c.iii. gofalse L: pop the top value ; jump if it is 0.
c.iv. gotrue L: pop the top value ; jump if it is non zero.
c.v. halt: stop execution
Steps to be followed while working on this project
1. Lexical specification:
To do lexical analysis of a C-routine. The scanner must be able to process contructs like expressions, assignment statements, if statement, if-else-if nesting and while loop. Ensure handling multiple files for scanning.
2. Yacc specification:
Syntax analysis of the above construct. The code can be assumed to be semantically correct, so no semantic checks using yacc action need to be done.
3. Translation:
a. Expression:
a.i. expr -> expr1 ‘op’ expr2 { expr.t := expr1.t || expr2.t || op }
a.ii. expr-> id { expr.t := id.lexeme } Attribute lexeme of an id gives its string representation. Attribute t of a non terminal gives its translation. || is the concatenation operator.
b. Assignment: b.i. stmt -> id := expr {stmt.t := ‘push lvalue ’ id.lexeme || expr.t || ‘:=’}
c. If statement: c.i. stmt -> if expr then stmt1 {out := newlabel(); stmt.t := expr.t || ‘gofalse ’ out || stmt1.t || ‘label ’ out }
d. While statement: d.i. stmt -> while expr then stmt1 {test = newlabel(); out = newlabel(); ‘label ’ test || expr.t || ‘gofalse ’ out || stmt1.t || ‘goto ’ test || ‘label ’ out }
Input guidelines
1. Input is a C-routine.
2. Ignore function calls.
3. The input must be completely realized by the above four translations (for example the binary search routine) Example Input:
int n = 10;
int u = 0 ;
int v = 1 ;
int t ;
int i = 2;
while(i <= n ) { t = u + v; u = v; v = t; }
Output: push lvalue n
push 10
:=
push lvalue u
push 0
:=
push lvalue v
push 1
:=
push lvalue i
push 2
:=
label test
push rvalue i
push rvalue n
<=
gofalse out
push lvalue t
push rvalue u
push rvalue v
+ :=
push lvalue u
push rvalue v
:=
push lvalue v
push rvalue t
:=
goto test
label out
halt
Explanation / Answer
For large programs, the compiler is actually part of a multistep tool chain
[preprocessor] [compiler] [assembler] [linker] [loader]
We will be primarily focused on the second element of the chain, the compiler. Our target language will be assembly language. I give a very short description of the other components, including some historical comments.
Preprocessors
Preprocessors are normally fairly simple as in the C language, providing primarily the ability to include files and expand macros. There are exceptions, however. IBM's PL/I, another Algol-like language had quite an extensive preprocessor, which made available at preprocessor time, much of the PL/I language itself (e.g., loops and I believe procedure calls).
Some preprocessors essentially augment the base language, to add additional capabilities. One could consider them as compilers in their own right, having as source this augmented language (say Fortran augmented with statements for multiprocessor execution in the guise of Fortran comments) and as target the original base language (in this case Fortran). Often the “preprocessor” inserts procedure calls to implement the extensions at runtime.
Assemblers
Assembly code is an mnemonic version of machine code in which names, rather than binary values, are used for machine instructions, and memory addresses.
Some processors have fairly regular operations and as a result assembly code for them can be fairly natural and not-too-hard to understand. Other processors, in particular Intel's x86 line, have let us charitably say more interesting instructions with certain registers used for certain things.
My laptop has one of these latter processors (pentium 4) so my gcc compiler produces code that from a pedagogical viewpoint is less than ideal. If you have a mac with a ppc processor (newest macs are x86), your assembly language is cleaner. NYU's ACF features sun computers with sparc processors, which also have regular instruction sets.
Two pass assembly
No matter what the assembly language is, an assembler needs to assign memory locations to symbols (called identifiers) and use the numeric location address in the target machine language produced. Of course the same address must be used for all occurrences of a given identifier and two different identifiers must (normally) be assigned two different locations.
The conceptually simplest way to accomplish this is to make two passes over the input (read it once, then read it again from the beginning). During the first pass, each time a new identifier is encountered, an address is assigned and the pair (identifier, address) is stored in a symbol table. During the second pass, whenever an identifier is encountered, its address is looked up in the symbol table and this value is used in the generated machine instruction.
Linkers
Linkers, a.k.a. linkage editors combine the output of the assembler for several different compilations. That is the horizontal line of the diagram above should really be a collection of lines converging on the linker. The linker has another input, namely libraries, but to the linker the libraries look like other programs compiled and assembled. The two primary tasks of the linker are
Relocating relative addresses
The assembler processes one file at a time. Thus the symbol table produced while processing file A is independent of the symbols defined in file B, and conversely. Thus, it is likely that the same address will be used for different symbols in each program. The technical term is that the (local) addresses in the symbol table for file A are relative to file A; they must be relocated by the linker. This is accomplished by adding the starting address of file A (which in turn is the sum of the lengths of all the files processed previously in this run) to the relative address.
Resolving external references
Assume procedure f, in file A, and procedure g, in file B, are compiled (and assembled) separately. Assume also that f invokes g. Since the compiler and assembler do not see g when processing f, it appears impossible for procedure f to know where in memory to find g.
The solution is for the compiler to indicated in the output of the file A compilation that the address of g is needed. This is called a use of g. When processing file B, the compiler outputs the (relative) address of g. This is called the definition of g. The assembler passes this information to the linker.
The simplest linker technique is to again make two passes. During the first pass, the linker records in its “external symbol table” (a table of external symbols, not a symbol table that is stored externally) all the definitions encountered. During the second pass, every use can be resolved by access to the table.
I will be covering the linker in more detail tomorrow at 5pm in 2250, OS Design
Loaders
After the linker has done its work, the resulting “executable file” can be loaded by the operating system into central memory. The details are OS dependent. With early single-user operating systems all programs would be loaded into a fixed address (say 0) and the loader simply copies the file to memory. Today it is much more complicated since (parts of) many programs reside in memory at the same time. Hence the compiler/assembler/linker cannot know the real location for an identifier. Indeed, this real location can change.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.