Lexical analysis is the process of converting a sequence of characters (such as
ID: 3914321 • Letter: L
Question
Lexical analysis is the process of converting a sequence of characters (such as a string) into a sequence of tokens (smaller strings, substrings, that have an identified "meaning"). A program that performs the lexical analysis is called a tokenizer. A tokenizer is usually paired with a parser, which together analyzes the syntax of the string in accordance with the particular programming language being used. Parsing is analyzing the string within the context of the particular computer language being used to find substrings that are meaningful. Your assignments will not include writing a parser. A string tokenizer allows an application to break a string into tokens. A token, as explained above, is a word within a string that may or may not have meaning when it is analyzed by a parser. A stream tokenizer takes an input stream and parses it into tokens. The stream tokenizer recognizes identifiers, numbers, quoted strings, and various comment styles. Each character is characterized as white space, alphabetic, numeric, quote, or comment character. Each character can have none or more of these characteristics. Since stream tokenizers are often used as the first step in parsing computer programs, they usually have several options related to processing or ignoring certain characters, depending on the programming language’s particular rules.
These options include the following: Whether to treat line breaks as token delimiters or whitespace (e.g., line breaks in VisualBasic indicate the end of a statement; in C++, they are ignored) Whether C-style comments are tokenized or skipped Whether C++-style comments are tokenized or skipped Whether keywords and names of identifiers should be converted to lowercase (e.g., C++ names and keywords are case-sensitive; SQL’s are not)
Using Java, C#, or another object-oriented language of your choice, write a stream tokenizer method with the following signature: String[] tokenize(Stream in, bool tokenizeAtLineBreaks, bool ignoreCComments, bool ignoreCppComments) Some important issues to remember are listed below: If one or both of the comments flags is set to TRUE, the entire comment should be treated as a single token. In other words, do not treat whitespace within the comment as a token delimiter. For example, the following lines of code each contain 3 tokens (highlighted) int age; /* This is the person’s age in years */ String name; // This is the person’s name If one or both of the comment flags is set to FALSE, the comment should be ignored and not returned or processed by the tokenizer. For example, if the ignoreCComments flag is set to FALSE, then the following line of code only has 2 tokens (the comment is skipped/ignored): int age; /* This is the person’s age in years */ And the following line of code has only 2 tokens if the ignoreCppComments is set to FALSE (the comment is skipped/ignored): String name; // This is the person’s name If you encounter a double quotation marks character ("), you are within a string literal token. Do not separate the words within the string literal into separate tokens. For example, the following code segment contains 4 tokens (highlighted): String univName = "Northcentral University";
Explanation / Answer
The use of InputStream as an input parameter is deprecated. Hence we will use Reader Object as input.
Below code is in JAVA
package proj1;
import java.io.IOException;
import java.io.Reader;
import java.io.StreamTokenizer;
import java.io.StringReader;
import java.util.stream.Stream;
public class Tokenizer {
static String tokenize(Reader in, boolean tokenizeAtLineBreaks, boolean ignoreCComments, boolean ignoreCppComments) throws IOException {
StreamTokenizer tokenizer = new StreamTokenizer(in);
String stringVal[];
int numberOfTokens = 0;
if((ignoreCComments==true) ||(ignoreCppComments==true)){
System.out.println(" treat both comments as a single token");
while(tokenizer.nextToken()!=StreamTokenizer.TT_EOF){
//counts the number of tokens
numberOfTokens++;
}
}
else if((ignoreCComments==false)||(ignoreCppComments==false)){
numberOfTokens=-1;
System.out.println(" Do not treat comments as a token");
while(tokenizer.nextToken()!=StreamTokenizer.TT_EOF){
//counts the number of tokens
numberOfTokens++;
}
}
System.out.println("No of tokens : "+numberOfTokens);
return tokenizer.sval;
}
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Tokenizer tokenize=new Tokenizer();
Reader in = new StringReader("int age; /* This is the person’s age in years */ String name; // This is the person’s name");
tokenize(in,true, true,true);
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.