Lexical analysis is the process of converting a sequence of characters (such as

ID: 3914321 • Letter: L

Question

Lexical analysis is the process of converting a sequence of characters (such as a string) into a sequence of tokens (smaller strings, substrings, that have an identified "meaning"). A program that performs the lexical analysis is called a tokenizer. A tokenizer is usually paired with a parser, which together analyzes the syntax of the string in accordance with the particular programming language being used. Parsing is analyzing the string within the context of the particular computer language being used to find substrings that are meaningful. Your assignments will not include writing a parser. A string tokenizer allows an application to break a string into tokens. A token, as explained above, is a word within a string that may or may not have meaning when it is analyzed by a parser. A stream tokenizer takes an input stream and parses it into tokens. The stream tokenizer recognizes identifiers, numbers, quoted strings, and various comment styles. Each character is characterized as white space, alphabetic, numeric, quote, or comment character. Each character can have none or more of these characteristics. Since stream tokenizers are often used as the first step in parsing computer programs, they usually have several options related to processing or ignoring certain characters, depending on the programming language’s particular rules.

These options include the following: Whether to treat line breaks as token delimiters or whitespace (e.g., line breaks in VisualBasic indicate the end of a statement; in C++, they are ignored) Whether C-style comments are tokenized or skipped Whether C++-style comments are tokenized or skipped Whether keywords and names of identifiers should be converted to lowercase (e.g., C++ names and keywords are case-sensitive; SQL’s are not)

Using Java, C#, or another object-oriented language of your choice, write a stream tokenizer method with the following signature: String[] tokenize(Stream in, bool tokenizeAtLineBreaks, bool ignoreCComments, bool ignoreCppComments) Some important issues to remember are listed below: If one or both of the comments flags is set to TRUE, the entire comment should be treated as a single token. In other words, do not treat whitespace within the comment as a token delimiter. For example, the following lines of code each contain 3 tokens (highlighted) int age; /* This is the person’s age in years */ String name; // This is the person’s name If one or both of the comment flags is set to FALSE, the comment should be ignored and not returned or processed by the tokenizer. For example, if the ignoreCComments flag is set to FALSE, then the following line of code only has 2 tokens (the comment is skipped/ignored): int age; /* This is the person’s age in years */ And the following line of code has only 2 tokens if the ignoreCppComments is set to FALSE (the comment is skipped/ignored): String name; // This is the person’s name If you encounter a double quotation marks character ("), you are within a string literal token. Do not separate the words within the string literal into separate tokens. For example, the following code segment contains 4 tokens (highlighted): String univName = "Northcentral University";

Explanation / Answer

The use of InputStream as an input parameter is deprecated. Hence we will use Reader Object as input.

Below code is in JAVA

package proj1;

import java.io.IOException;

import java.io.Reader;

import java.io.StreamTokenizer;

import java.io.StringReader;

import java.util.stream.Stream;

public class Tokenizer {

static String tokenize(Reader in, boolean tokenizeAtLineBreaks, boolean ignoreCComments, boolean ignoreCppComments) throws IOException {

StreamTokenizer tokenizer = new StreamTokenizer(in);

String stringVal[];

int numberOfTokens = 0;

if((ignoreCComments==true) ||(ignoreCppComments==true)){

System.out.println(" treat both comments as a single token");

while(tokenizer.nextToken()!=StreamTokenizer.TT_EOF){

//counts the number of tokens

numberOfTokens++;

}

else if((ignoreCComments==false)||(ignoreCppComments==false)){

numberOfTokens=-1;

System.out.println(" Do not treat comments as a token");

while(tokenizer.nextToken()!=StreamTokenizer.TT_EOF){

//counts the number of tokens

numberOfTokens++;

}

System.out.println("No of tokens : "+numberOfTokens);

return tokenizer.sval;

}

public static void main(String[] args) throws IOException {

// TODO Auto-generated method stub

Tokenizer tokenize=new Tokenizer();

Reader in = new StringReader("int age; /* This is the person’s age in years */ String name; // This is the person’s name");

tokenize(in,true, true,true);

}

Navigate

Lexical Definitions Identify one defect in each of the following definitions, us

Lexie is a boat maker and makes scale-model versions of the exact same model boa

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Lexical analysis is the process of converting a sequence of characters (such as

Question

Explanation / Answer

Related Questions

Navigate