Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

JAVA PROGRAM: File Operations You are to write a program that reads in a body of

ID: 3825823 • Letter: J

Question

JAVA PROGRAM:

File Operations

You are to write a program that reads in a body of text, and performs some queries on the text. You will write the results from the operations into an output file. You will submit the output file along with your code and pseudo code.

Details

The purpose of this exercise is to give you familiarity with File Operations. You will read in a lengthy text file; Bram Stokers Dracula as text from guttenberg.org http://www.gutenberg.org/ebooks/345.txt.utf-8. If you have problems search for 345 on Gutenberg.org, and select the Plain Text UTF-8 link. As you read the text you will map all words to lower case, and remove all punctuation characters (Only numbers and letters should remain). Once the words are pre-processed you will determine the following items. Do not modify the text file you download, although there are headers and footers from Gutenberg, please leave them intact.

When the file is completely loaded you will perform queries to exhibit the following things:

• How many times do the following words appear in the text? o transylvania o harker o renfield o vampire o expostulate o fangoriously

• What is the length of the longest word?

• How many total words are there in the processed text?

Write the output from the questions above to a file named “project3.out” No User Interface is required for this project.

Suggestions (Use at your discretion) • Download the text file to your local machine. • Write a program to open the text file and display it verbatim before you start adding it to the table. • The individual chains can contain information to help with the queries (length of chain, etc).

Explanation / Answer

Created word.java to store frequency and value. TestFile.java contains logic to read the text and methods to process the file and get the desired results.

Word.java

import java.util.Comparator;

public class Word implements Comparator<Word>{
   private String value;
   private int frequency;
   public String getValue() {
       return value;
   }
   public void setValue(String value) {
       this.value = value;
   }
   public int getFrequency() {
       return frequency;
   }
   public void setFrequency(int frequency) {
       this.frequency = frequency;
   }
   //Override to String to print the required value and frequency
   public String toString(){
      
       return " "+value+" "+frequency;
   }
  
   //To sort based on word length
   @Override
   public int compare(Word w1, Word w2) {
       return w2.value.length()-w1.value.length();
   }
}

TestFile.java

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Scanner;

public class TestFile {
  
   public static void main(String[] args) {
       List<String> words = new ArrayList<String>();
       File file = new File("Guetenberg.txt");
       words = inputFromFile(file);
       List<Word> uniqueWords = setFrequencies(words);
       System.out.println("Number of times word transylvania appeared in text "+findFrequency(uniqueWords, "transylvania"));
       System.out.println("Number of times word harker appeared in text "+findFrequency(uniqueWords, "harker"));
       System.out.println("Number of times word vampire appeared in text "+findFrequency(uniqueWords, "vampire"));
       System.out.println("Number of times word expostulate appeared in text "+findFrequency(uniqueWords, "expostulate"));
       System.out.println("Number of times word fangoriously appeared in text "+findFrequency(uniqueWords, "fangoriously"));
       //as the uniqueWords contains words in sorted list based on length of the word, first word is the longest word
       System.out.println(" Longest Word "+ uniqueWords.get(0).getValue()+" length: "+uniqueWords.get(0).getValue().length());
       // Total words in the file--many words might have repeated
       System.out.println("Total Words processed in text: "+words.size());
       //Total unique words
       System.out.println("Total unique Words processed in text: "+uniqueWords.size());
      
   }

   /*
   * Method to read words from the file and set to the list
   */
   public static List<String> inputFromFile(File myFile) {
       List<String> words = new ArrayList<String>();
       Scanner in;
       try {
           // read the details from the given file
           in = new Scanner(myFile);
           // read the next word through scanner
           while (in.hasNext()) {
               String nextWord = in.next();
               //Replace all characters which is not a letter or number
               nextWord = nextWord.replaceAll("[^a-zA-Z0-9]", " ");
               String[] splitWords=nextWord.split(" ");
               for(int i=0;i<splitWords.length;i++){
                   words.add(splitWords[i]);
               }
              
           }
           in.close();
       } catch (FileNotFoundException e) {
           System.out.println("Input file not found : " + myFile);
           System.exit(1);
       }
       // Read from input file and get the list of words
       return words;
   }

   public static List<Word> setFrequencies(List<String> words) {
       // To store List of words with their frequencies
       List<Word> newList = new ArrayList<Word>();
       List<String> addedList = new ArrayList<String>();
       for (String word : words) {
           Word w = new Word();
           // Condition to make sure the added word is not added.
           if (!addedList.contains(word)) {
               w.setValue(word);
               w.setFrequency(Collections.frequency(words, word));
               newList.add(w);
               addedList.add(word);
           }
       }
       //Sorting is done based on length of the word
       Collections.sort(newList, new Word());
       return newList;
   }

   /*
   * returns common Words from top 10 words of both files
   */
   public static int findFrequency(List<Word> wordsList, String word) {
       int frequency=0;
       for (Word word1 : wordsList) {
           if (word1.getValue().equalsIgnoreCase(word)) {// Condition to make sure the added
                                           // word is not added.
               frequency=word1.getFrequency();
           }
       }
       if(frequency==0){
           System.out.println("Word "+word+" Non found in the text file.");
       }
       return frequency;
   }

}

Sample output:

Number of times word transylvania appeared in text 16
Number of times word harker appeared in text 17
Number of times word vampire appeared in text 14
Number of times word expostulate appeared in text 1
Word fangoriously Non found in the text file.
Number of times word fangoriously appeared in text 0

Longest Word straightforwardly length: 17
Total Words processed in text: 170688
Total unique Words processed in text: 10487