Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Write a lexical analyzer for LPAS programs. Input to the scanner (lexical analyz

ID: 3691198 • Letter: W

Question

Write a lexical analyzer for LPAS programs. Input to the scanner (lexical analyzer) is a text file containing a LPAS program.   Output consists of one or two files (1) a file (called Outfile here) of tokens, and perhaps – up to you -- (2 – optional) a listing file that echos the input file, with the addition of line numbers, and possibly, lexical error messages.

File handler: The main routine is GetChar, used by the scanner.   It returns the next character from Infile, as long as there is one.   It may do this, however, by taking the character from a buffer instead of the actual Infile.   If the buffer is empty, it refills the buffer before getting a character. Do NOT bring in the entire input file at once.

         You will also need either a backup function to move left one character, or a scan-without-actually-eating the character function. One or the other of these is necessary to deal with situations where you do not see the end of one token until you have the first character of the next, as in "ident1<>ident2 + 8". A backup is needed after ident1 (since you have scanned the "<", but not on ident2 where you only scan a " ").

         Hide all actual file handling in something like a FillBuffer routine, if possible.   This can read one line of the input file, putting it in an array of char for GetChar. It is also easy to produce a listing file here, simply dumping the buffer to the listing file, first writing the line number. Actual file handling can be messy; try to hide the details from GetChar. Remember modularity in program design; no one routine should be doing different tasks (not both processing and i/o).

Scanner: Program based on a deterministic finite automaton -- see text and class discussion. FindToken is any intelligent implementation of a DFA for LPAS. The general form of the main program is simple:

                        Initialize (all global values);

                        do {

                             TokenRec = FindToken(parameters);

                              WriteToScren(TokenRec);

                              WriteTokenFile(TokenRec);

                           }

                        while (TokenRec.tokenNumber != eof_token);

                        Clean up any details;

//Note: call to FindToken is entirely independent of getting a line, or any other physical attribute of the input file.

//Note: FindToken does no printing (in general, bad form to have i/o in a processing routine, except for debugging purposes.

Tokens:   Token number plus the lexeme or associated string (often, not always needed).

Testing: Easy. Create a file containing an LPas program (or any sequence of tokens, since grammar does not count here). Send it to the lexical analyzer, and check that the file of tokens is correct.

Explanation / Answer

package java_cup; import java_cup.runtime.token; import java_cup.runtime.str_token; import java.util.Hashtable; /** This class implements a small scanner (aka lexical analyzer or lexer) for * the JavaCup specification. This scanner reads characters from standard * input (System.in) and returns integers corresponding to the terminal * number of the next token. Once end of input is reached the EOF token is * returned on every subsequent call.

* Tokens currently returned include:

  *    Symbol        Constant Returned     Symbol        Constant Returned  *    ------        -----------------     ------        -----------------  *    "package"     PACKAGE               "import"      IMPORT   *    "code"        CODE                  "action"      ACTION   *    "parser"      PARSER                "terminal"    TERMINAL  *    "non"         NON                   "init"        INIT   *    "scan"        SCAN                  "with"        WITH  *    "start"       START                   ;           SEMI   *      ,           COMMA                   *           STAR   *      .           DOT                     :           COLON  *      ::=         COLON_COLON_EQUALS      |           BAR  *    identifier    ID                    {:...:}       CODE_STRING  *  
* All symbol constants are defined in sym.java which is generated by * JavaCup from parser.cup.

* * In addition to the scanner proper (called first via init() then with * next_token() to get each token) this class provides simple error and * warning routines and keeps a count of errors and warnings that is * publicly accessible.

* * This class is "static" (i.e., it has only static members and methods). * * @version last updated: 11/25/95 * @author Scott Hudson */ public class lexer { /*-----------------------------------------------------------*/ /*--- Constructor(s) ----------------------------------------*/ /*-----------------------------------------------------------*/ /** The only constructor is private, so no instances can be created. */ private lexer() { } /*-----------------------------------------------------------*/ /*--- Static (Class) Variables ------------------------------*/ /*-----------------------------------------------------------*/ /** First character of lookahead. */ protected static int next_char; /** Second character of lookahead. */ protected static int next_char2; /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** EOF constant. */ protected static final int EOF_CHAR = -1; /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Table of keywords. Keywords are initially treated as identifiers. * Just before they are returned we look them up in this table to see if * they match one of the keywords. The string of the name is the key here, * which indexes Integer objects holding the symbol number. */ protected static Hashtable keywords = new Hashtable(23); /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Table of single character symbols. For ease of implementation, we * store all unambiguous single character tokens in this table of Integer * objects keyed by Integer objects with the numerical value of the * appropriate char (currently Character objects have a bug which precludes * their use in tables). */ protected static Hashtable char_symbols = new Hashtable(11); /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Current line number for use in error messages. */ protected static int current_line = 1; /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Character position in current line. */ protected static int current_position = 1; /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Count of total errors detected so far. */ public static int error_count = 0; /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Count of warnings issued so far */ public static int warning_count = 0; /*-----------------------------------------------------------*/ /*--- Static Methods ----------------------------------------*/ /*-----------------------------------------------------------*/ /** Initialize the scanner. This sets up the keywords and char_symbols * tables and reads the first two characters of lookahead. */ public static void init() throws java.io.IOException { /* set up the keyword table */ keywords.put("package", new Integer(sym.PACKAGE)); keywords.put("import", new Integer(sym.IMPORT)); keywords.put("code", new Integer(sym.CODE)); keywords.put("action", new Integer(sym.ACTION)); keywords.put("parser", new Integer(sym.PARSER)); keywords.put("terminal", new Integer(sym.TERMINAL)); keywords.put("non", new Integer(sym.NON)); keywords.put("init", new Integer(sym.INIT)); keywords.put("scan", new Integer(sym.SCAN)); keywords.put("with", new Integer(sym.WITH)); keywords.put("start", new Integer(sym.START)); /* set up the table of single character symbols */ char_symbols.put(new Integer(';'), new Integer(sym.SEMI)); char_symbols.put(new Integer(','), new Integer(sym.COMMA)); char_symbols.put(new Integer('*'), new Integer(sym.STAR)); char_symbols.put(new Integer('.'), new Integer(sym.DOT)); char_symbols.put(new Integer('|'), new Integer(sym.BAR)); /* read two characters of lookahead */ next_char = System.in.read(); if (next_char == EOF_CHAR) next_char2 = EOF_CHAR; else next_char2 = System.in.read(); } /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Advance the scanner one character in the input stream. This moves * next_char2 to next_char and then reads a new next_char2. */ protected static void advance() throws java.io.IOException { int old_char; old_char = next_char; next_char = next_char2; if (next_char == EOF_CHAR) next_char2 = EOF_CHAR; else next_char2 = System.in.read(); /* count this */ current_position++; if (old_char == ' ') { current_line++; current_position = 1; } } /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Emit an error message. The message will be marked with both the * current line number and the position in the line. Error messages * are printed on standard error (System.err). * @param message the message to print. */ public static void emit_error(String message) { System.err.println("Error at " + current_line + "(" + current_position + "): " + message); error_count++; } /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Emit a warning message. The message will be marked with both the * current line number and the position in the line. Messages are * printed on standard error (System.err). * @param message the message to print. */ public static void emit_warn(String message) { System.err.println("Warning at " + current_line + "(" + current_position + "): " + message); warning_count++; } /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** Determine if a character is ok to start an id. * @param ch the character in question. */ protected static boolean id_start_char(int ch) { return (ch >= 'a' && ch = 'A' && ch = '0' && ch

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote