Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Lexical Analyzer in C++ For the lexical analyzer, you will be provided with a de

ID: 3841003 • Letter: L

Question

Lexical Analyzer in C++

For the lexical analyzer, you will be provided with a description of the lexical syntax of the language. You will produce a lexical analysis function and a program to test it. The lexical rules of the language are as follows:

1. The language has identifiers, which are defined to be a letter followed by zero or more letters or numbers. This will be the token ID

2. The language has integer constants, which are defined to be an optional leading dash (for a negative number), followed by one or more digits. This will be the token ICONST

3. The language has real constants, which are defined to be an optional leading dash (for a negative number), followed by one or more digits, followed by a dot, followed by one or more digits. This will be the token FCONST

4. The language has quoted strings, which are letters enclosed inside of double quotes, all on the same line. This will be the token STRING

5. The language has reserved the keywords print and set. They will be the tokens PRINT and SET

6. The language has several single-character tokens. They are + - * , { } [ ] ( ) ; which will be the tokens PLUS MINUS STAR COMMA LBR RBR LSQ RSQ LPAREN RPAREN SC

7. A comment is all characters from a # to the end of the line; it is ignored and is not returned as a token. NOTE that a # in the middle of a STRING is NOT a comment!

8. An error will be denoted by the ERR token, and EOF by the DONE token

Note that any error detected by the lexical analyzer will result in the ERR token, with the lexeme value equal to the string recognized when the error was detected. The program for Assignment 2 implements a lexical analyzer. The calling sequence for the lexical analyzer, a definition for all of the tokens, and a class definition for Token, is included in the startercode for the assignment. The assignment is to write the lexical analyzer function and some test code around it. The test code takes several command line arguments: -v (optional) if present, every token is printed when it is seen -stats (optional) if present, statistics are printed -sum (optional) if present, summary information is printed filename (optional) if present, read from the filename; otherwise read from standard in Note that no other flags (arguments that begin with a dash) are permitted. If an unrecognized flag is present, the program should print “Invalid argument {arg}”, where {arg} is whatever flag was given, and it should stop running. At most one filename can be provided, and it must be the last command line argument. If more than one filename is provided, the program should print “Too many file names” and it should stop running. If the program cannot open a filename that is given, the program should print “Could not open {arg}”, where {arg} is the filename given, and it should stop running. The program should repeatedly call the lexical analyzer function until it returns DONE or ERR. If it returns DONE, the program proceeds to handling the -stats and -sum options, if any, and then exits. If it returns ERR, the program should print “Error on line N ({lexeme})”, where N is the line number in the token and lexeme is the lexeme from the token, and it should stop running. If the -v option is present, the program should print each token as it is read and recognized, one token per line. The output format for the token is the token name in all capital letters (for example, the token LPAREN should be printed out as the string LPAREN. In the case of token ID, ICONST, FCONST and STRING, the token name should be followed by a space and the lexeme in parens. For example, if the identifier “hello” is recognized, the -v output for it would be ID (hello) If the -stats option is present, the program should, after seeing the DONE token, print out the following report: Total IDs: N List of IDs: X Where N is the number of times the ID token was seen and X is a comma separated list of identifier lexemes. If N is 0, then List of IDs is not printed. If the -sum option is present the program should, after seeing the DONE token and processing the -stats option, print out the following report: Total lines: L Total tokens: N Most frequently used tokens: X Where L is the number of input lines, N is the number of tokens (not counting DONE), and X is a comma separated list of the tokens that appear most frequently. There may be only one item in the list, or there may be many. The list should be in alphabetical order. If N is 0, then the Most frequently used tokens line is not printed.

The program must:

-Recognize invalid command line options (items beginning with -)

-Recognize a file that cannot be opened

-Recognize a case with more than one file name -Print correct number of lines for empty file

-Print correct number of lines for file containing nothing but comments

-Recognize a string with a newline in it as an error

-Recognize a string with a # in it as a string, not a comment

-Recognize all token types correctly

-Printout with -v option correctly

Polylex.h contents:

/* * polylex.h * * Created on: Feb 20, 2017 * Author: gerardryan */ #ifndef POLYLEX_H_ #define POLYLEX_H_ #include #include extern int currentLine; // in ONE PLACE in your program, you must have the following line: // int currentLine = 0; // each time you see a ' ', add one to currentLine enum TokenTypes { ID, ICONST, FCONST, STRING, PRINT, SET, PLUS, MINUS, STAR, COMMA, LBR, RBR, LSQ, RSQ, LPAREN, RPAREN, SC, ERR, DONE }; class Token { private: TokenTypes t; std::string lexeme; int line; public: Token(TokenTypes t=ERR, std::string lexeme="") { this->t = t; this->lexeme = lexeme; this->line = currentLine; } TokenTypes getType() const { return t; } std::string getLexeme() const { return lexeme; } int getLine() const { return line; } bool operator==(const TokenTypes& tt) { return t == tt; } bool operator!=(const TokenTypes& tt) { return t != tt; } }; extern Token getToken(std::istream& source); #endif /* POLYLEX_H_ */

Explanation / Answer

//Entry point is in lex.cpp
//Source code is attached
// Lexical Analyzer for the C programming language
// lex.cpp : Defines the entry point for the console application.

#include "stdafx.h"
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include "DFA.h"
#include <iomanip>

using namespace std;

void analyze(string token, DFA *identifierDFA, DFA *integerDFA, DFA *floatDFA, DFA *stringDFA);
bool isWhitespace(char ch);
bool isSymbol(char ch);
bool isOperator(char ch);
bool isKeyword(string token);
bool isIdentifier(string token, DFA *dfa);
bool isInteger(string token, DFA *dfa);
bool isFloat(string token, DFA *dfa);
bool isString(string token, DFA *dfa);
string stripSpaces(string str);

int _tmain(int argc, _TCHAR* argv[])
{
string input;
cout << "Enter a txt file for lexical analysis: " << endl;
cin >> input;

ifstream file(input);

if (file.is_open()) {
  string line; // the line to be read in from file
  string token; // the token to be analyzed
  char lookahead; // char that looks ahead for spaces and symobls

  DFA identifierDFA("identifier.txt");
  DFA integerDFA("integer.txt");
  DFA floatDFA("float.txt");
  DFA stringDFA("string.txt");


  while (getline(file, line)) {
   
   for (int i = 0; i < line.length(); i++) {
    lookahead = line[i];
    if (isWhitespace(lookahead)) {
     // token can be analyzed since we hit a whitespace
     analyze(token, &identifierDFA, &integerDFA, &floatDFA, &stringDFA);
     token.clear();
    }
    else if (isSymbol(lookahead)) {
     // token can be analyzed since we hit a symbol
     analyze(token, &identifierDFA, &integerDFA, &floatDFA, &stringDFA);
     token.clear();
     cout << setw(24) << left << lookahead << right << "Symbol" << endl;
    }
    else if (isOperator(lookahead)) {
     // token can be analyzed since we hit an operator
     analyze(token, &identifierDFA, &integerDFA, &floatDFA, &stringDFA);
     token.clear();
     cout << setw(24) << left << lookahead << right << "Operator" << endl;
    }
    else if (lookahead == '"'){
     // token can be analyzed since we hit the start of a string
     analyze(token, &identifierDFA, &integerDFA, &floatDFA, &stringDFA);
     token.clear();

     token.push_back(lookahead);
     i++;
     lookahead = line[i];

     while(lookahead != '"') {
      token.push_back(lookahead);
      i++;
      lookahead = line[i];
     }

     token.push_back(lookahead);
    }

    else {
     // didn't hit a delimiter, so append lookahead to token
     token.push_back(lookahead);
    }
   }
  }

} else {
  cout << "Unable to open file " << input << endl;
}

return 0;
}

void analyze(string token, DFA *identifierDFA, DFA *integerDFA, DFA *floatDFA, DFA *stringDFA) {
if (token.length() > 0) {
  if (isKeyword(token)) {
   cout << setw(24) << left << token << right << "Keyword" << endl;
  } else if (isIdentifier(token, identifierDFA)) {
   cout << setw(24) << left << token << right << "Identifier" << endl;
  } else if (isInteger(token, integerDFA)) {
   cout << setw(24) << left << token << right << "Integer" << endl;
  } else if (isFloat(token, floatDFA)) {
   cout << setw(24) << left << token << right << "Float" << endl;
  } else if (isString(token, stringDFA)) {
   cout << setw(24) << left << token << right << "String Literal" << endl;
  } else {
   cout << setw(24) << left << token << right << "Error" << endl;
  }
}
}

// returns true if the char argument is whitespace or a comma (since commas serve as delimiters in C)
bool isWhitespace(char ch) {
if (ch == ' ' || ch == ' ' || ch == ' ' || ch == ',') {
  return true;
} else {
  return false;
}
}

// returns true if the char argument is a special symbol
bool isSymbol(char ch) {
if (ch == '(' || ch == ')' || ch == '{' || ch == '}' || ch ==';' || ch == '[' || ch == ']') {
  return true;
} else {
  return false;
}
}

// returns true if the char argument is an operator
bool isOperator(char ch) {
if (ch == '+' || ch == '-' || ch == '=' || ch == '/' || ch =='*' || ch =='%' || ch =='<' || ch=='>') {
  return true;
} else {
  return false;
}
}

bool isKeyword(string token) {
if (token == "for" || token == "while" || token == "if" || token == "else" || token == "int" || token == "float"
  || token == "short" || token == "do" || token == "char" || token == "return" || token == "auto" || token == "struct" || token == "union"
  || token == "break" || token == "long" || token == "double" || token == "const" || token == "unsigned" || token == "switch" || token == "continue"
  || token == "signed" || token == "void" || token == "case" || token == "enum" || token == "register" || token == "typedef" || token == "default"
  || token == "goto" || token == "extern" || token == "static") {
  return true;
} else {
  return false;
}
}

bool isIdentifier(string token, DFA *dfa) {
return dfa->inLanguage(token);
}

bool isInteger(string token, DFA *dfa) {
return dfa->inLanguage(token);
}

bool isFloat(string token, DFA *dfa) {
return dfa->inLanguage(token);
}

bool isString(string token, DFA *dfa) {
return dfa->inLanguage(stripSpaces(token));
}

string stripSpaces(string str) {
string noSpaces;
for (int i = 0; i < str.length(); i++) {
  if (str[i] != ' ') {
   noSpaces.push_back(str[i]);
  }
}
return noSpaces;
}

//DFA.h is another Similar to Polylex.h

#pragma once
#include <vector>
#include <string>
#include <map>
#include <fstream>

using namespace std;

class DFA
{
public:
DFA(string inputFileName);
~DFA(void);

bool inLanguage(string word) const; // returns true if the word passed in is in the language defined by the DFA

private:
vector<string> finalStates;
vector<string> alphabet;
vector<map<string, string>> transitions;
vector<string> transitionTokens;
ifstream file;
string start;
};

//Defining the main logic in DFA.cpp

#include "stdafx.h"
#include "DFA.h"
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <map>
#include <sstream>
#include <algorithm>

void tokenize(string str, vector<string> &tokens);

DFA::DFA(string inpFName)
{
file.open(inpFName);
if (file.is_open()) {
  string line;
  string alpha;
  string from;
  string to;
  int index;

  getline(file, start);
  getline(file, line);
  tokenize(line, finalStates);
  getline(file, line);
  tokenize(line, alphabet);
  transitions.resize(alphabet.size());
  
  
  while (getline(file, line)) {
   transitionTokens.clear();
   tokenize(line, transitionTokens);
   from = transitionTokens[0];
   alpha = transitionTokens[1];
   to = transitionTokens[2];

   // find which index contains "alpha" in the vector "alphabet"
   for (unsigned int i = 0; i < alphabet.size(); ++i) {
    if (alphabet[i] == alpha) {
     index = i;
     break;
    }
   }

   // then map the string "from" to "to", in the map at index "index" in vector "transitions".
   transitions[index].insert(pair<string,string>(from, to));
  }


  file.close();
} else {
  cout << "DFA: Error opening file " << inpFName << endl;
}
}


DFA::~DFA(void)
{
}

void tokenize(string str, vector<string> &tokens) {
istringstream ss(str);
while (!ss.eof()) {
   string x;
   getline( ss, x, ' ' );
   tokens.push_back(x);
}
}

bool DFA::inLanguage(string word) const {
bool valid = true;
string currentState = start;
int index = 0;

for (unsigned int i = 0; i < word.length(); ++i) {
   auto it = find(alphabet.begin(), alphabet.end(), string(1, word[i]));
   if (it != alphabet.end()) {
    index = it - alphabet.begin();
    if (transitions[index].find(currentState) != transitions[index].end()) {
     currentState = transitions[index].at(currentState);
    } else {
     valid = false;
    }
   }
   else {
    //cout << word[i] << " is not in the alphabet" << endl;
    valid = false;
   }
  }

  if (valid) {
   // if current state is a final state, word was valid
   auto it2 = find(finalStates.begin(), finalStates.end(), currentState);
   if (it2 != finalStates.end()) {
    valid = true;
   } else {
    valid = false;
   }

  } else {
   valid = false;
  }

  return valid;
}

//Adding stdafx.cpp
#include "stdafx.h"

//adding stdafx.h
// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently


#pragma once

#include "targetver.h"

#include <stdio.h>
#include <tchar.h>

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote