Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Programming Project: Word Map Do writings by individual authors have statistical

ID: 3855657 • Letter: P

Question

Programming Project: Word Map


Do writings by individual authors have statistical signatures? They certainly do, and while such signatures say little about an author's art, they can say something about literary styles of an era, and can even help clarify historical controversies about authorship. Statistical studies, for example, have shown that the Illiad and the Odyssey were not written by a single individual.

For this assignment you are to create a program that analyzes text files -- novels perhaps, or newspaper articles -- and produces two statistics about these texts: word size frequency, and average sentence length.

In particular, you should write a two class application that produces such statistics. Your classes should be called WordMap.java, and WordMapDriver.java. The driver class should read in the name of a file that holds a text, and then provide an analysis for that text. Here is a sample:


> java WordMapDriver
enter name of a file
Analyzed text: /Users/moll/CS121/AliceInWonderland.txt
words of length 1: 7.257%
words of length 2: 14.921%
words of length 3: 24.073%
words of length 4: 20.847%
words of length 5: 12.769%
words of length 6: 7.374%
words of length 7: 6.082%
words of length 8: 3.012%
words of length 9: 1.812%
words of length 10: 0.820%
words of length 11: 0.501%
words of length 12: 0.236%
words of length 13: 0.134%
words of length 14: 0.083%
words of length 15 or larger: 0.001%
average sentence length: 16.917


Your job, then, is to code a solution to this problem, and provide these two statistics - word size percentage, and average sentence length (thus in the example given, 7.257 percent of the words are of length 1, 14.921 percent of the words are of length 2, and so forth, and the average sentence length is 16.917 words).

You can obtain interesting sample texts by, for example, visiting the Gutenberg foundation website (Gutenberg.org), and downloading books from there.

Tips

You should read external files by extending the Echo class from Chapter 10.

An easy and acceptable way to calculate the average length of sentences: count the number of words, count the number of end-of-sentence markers -- !,.,?, then divide the first by the second. Thus if a text has 21 words and 2 periods and a question mark, then its average sentence length is 7.

Show percentages using printf, as described in Chapter 5 of the text. Precision: as above in the example, 3 places to right of decimal point. To include a % symbol in a format string, include two percent symbols. The control character generates a carriage return. This statement:


prints


and then advances to the next line.


Additional Requirements

You must use a try/catch harness for your WordMapDriver code.

You must use The String method split to extract the individual words on each line of the text you are examining. In addition to the space symbol, use these characters as delimiters: ,.!?;:

You must comment your classes: add a one line comment for every method and for every instance variable. This comment should clearly state the role of that Java constituent.

NOTE: Do NOT put import statements in your classes. They are already included in the file.

Place your WordMapDriver class in the box below.

import java.util.*;

Place your WordMap class here.

import java.io.*;

Explanation / Answer

WordMapDriver.java==============================================

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;

/*
* WordMapDriver has the responsibilty of reading text from file
* and supplying valid sentences to WordMap class for further calculations
*/

public class WordMapDriver {
  
   //WordMap class instance
   private WordMap wordMap;
  
   //An arraylist to accumulate all sentences extracted from text file
   private ArrayList<String> sentnenceList;
  
   //Class WordMapDriver is only meaningful, when its been supplied a file
   //Hence, a constructor for the same
   //The constuctor also constructs the sentence list
   public WordMapDriver(String filename){
       File doc = new File(filename);
       sentnenceList = new ArrayList<>();
      
       //The "must" try catch block
       try {
           Scanner in = new Scanner(doc);
          
           //To separate strings on the base of .?! followed by a whitespace.
           //A whitespace eliminates patterns like ... !! etc
           in.useDelimiter("[\.\?\!][ ]");      
           while(in.hasNext()){
               sentnenceList.add(in.next());
           }
           in.reset();
           in.close();
          
       } catch (FileNotFoundException e) {
           e.printStackTrace();
       }
      
   }
  
   //Once all set, this calls for the core functionality
   public void drive(){
       wordMap = new WordMap(sentnenceList);
       wordMap.calculateStats();
   }
  
   public static void main(String[] args) {
       Scanner in = new Scanner(System.in);
       System.out.println("Enter File Name:");
       WordMapDriver driver = new WordMapDriver(in.nextLine().trim());
       driver.drive();
       in.close();
   }

}

WordMap.java==========================================================

import java.util.ArrayList;

/*
* Word Map recives a list of sentences from Driver.
* While processing each sentence, it counts total number of words
* and also, counts each word's length thus incrementing its respective counter
* in the end required stats is generated from the above data, and is diplayed
*/

public class WordMap {
  
   //Recieved from WordMapDriver
   private ArrayList<String> sentences;
  
   //Derived from ArrayList of Sentences
   private int numberOfSentences;
  
  
   //Keeping count for every word size 1-15 and greater
   private int wordCount[];
  
   //To store the stats for each wordCount
   private float stats[];
  
   //Total number of words encounterd
   private int totalWords;
  
   //Average Sentence Length
   private float avgSentence;
  
  
   // A method to dipsplay figures on screen
   private void displayFigures() {
       for(int i=1;i<=15;i++){
           System.out.printf("Words of length %d: %4.3f%% ",i,stats[i-1]);
       }
      
       System.out.printf("Average sentence length: %4.3f ",avgSentence);
      
   }
  
  
   /*
   *   For each word encounterd, increase the total words count by 1
   *   and for its length, increase the respective counter in wordCount
   *   for example, a word "hello" is of length 5, increase wordCount for slot 5(array index 4)
   *   by 1.
   */
   private void countAllWords(){
      
       for(String sentence:sentences){
           //"must" split to extract words
           String[] words = sentence.split(" ");
          
           for(String word:words){
               ++totalWords;
              
               int length=word.trim().length();
              
               //Length adjustment for words ending with a comma
               if(word.endsWith(","));
               --length;
                  
               if(length>0 && length<=15){
                   ++wordCount[length-1];
               }else if(length>15){
                   ++wordCount[14];
               }
           }
       }
   }
  

   /*
   *   A wordMap without a list of sentences is meaningless
   *   Hence, this constructor makes it compulsory for the user/program to pass a list
   */
   public WordMap(ArrayList<String> sentences){
       this.sentences=sentences;
       numberOfSentences=sentences.size();   //Will be required to get the average sentence length
       wordCount=new int[15];       //Keeping word counts for length upto 15 and larger, similar to example
       stats=new float[15];
       totalWords=0;
       avgSentence=0;
   }
  
  
   //Core functionality employing simple stats formulas to derive and store the results
   public void calculateStats(){
       countAllWords();
       for(int i=0;i<15;i++){
           stats[i]=((float)wordCount[i]/totalWords)*100;
       }
       avgSentence=(float)totalWords/numberOfSentences;
      
       displayFigures();
   }
  
  
}