Do writings by individual authors have statistical signatures? They certainly do
ID: 3854090 • Letter: D
Question
Do writings by individual authors have statistical signatures? They certainly do, and while such signatures say little about an author's art, they can say something about literary styles of an era, and can even help clarify historical controversies about authorship. Statistical studies, for example, have shown that the Illiad and the Odyssey were not written by a single individual.
For this assignment you are to create a program that analyzes text files -- novels perhaps, or newspaper articles -- and produces two statistics about these texts: word size frequency, and average sentence length.
In particular, you should write a two class application that produces such statistics. Your classes should be called WordMap.java, and WordMapDriver.java. The driver class should read in the name of a file that holds a text, and then provide an analysis for that text. Here is a sample:
> java WordMapDriver
enter name of a file
Analyzed text: /Users/moll/CS121/AliceInWonderland.txt
words of length 1: 7.257%
words of length 2: 14.921%
words of length 3: 24.073%
words of length 4: 20.847%
words of length 5: 12.769%
words of length 6: 7.374%
words of length 7: 6.082%
words of length 8: 3.012%
words of length 9: 1.812%
words of length 10: 0.820%
words of length 11: 0.501%
words of length 12: 0.236%
words of length 13: 0.134%
words of length 14: 0.083%
words of length 15 or larger: 0.001%
average sentence length: 16.917
Your job, then, is to code a solution to this problem, and provide these two statistics - word size percentage, and average sentence length (thus in the example given, 7.257 percent of the words are of length 1, 14.921 percent of the words are of length 2, and so forth, and the average sentence length is 16.917 words).
You can obtain interesting sample texts by, for example, visiting the Gutenberg foundation website (Gutenberg.org), and downloading books from there.
Tips
You should read external files by extending the Echo class from Chapter 10.
An easy and acceptable way to calculate the average length of sentences: count the number of words, count the number of end-of-sentence markers -- !,.,?, then divide the first by the second. Thus if a text has 21 words and 2 periods and a question mark, then its average sentence length is 7.
Show percentages using printf, as described in Chapter 5 of the text. Precision: as above in the example, 3 places to right of decimal point. To include a % symbol in a format string, include two percent symbols. The control character generates a carriage return. This statement:
prints
and then advances to the next line.
Additional Requirements
You must use a try/catch harness for your WordMapDriver code.
You must use The String method split to extract the individual words on each line of the text you are examining. In addition to the space symbol, use these characters as delimiters: ,.!?;:
You must comment your classes: add a one line comment for every method and for every instance variable. This comment should clearly state the role of that Java constituent.
Explanation / Answer
Class FileAccessor.java
package wordpercentagesdriver;
import java.io.*;
import java.util.Scanner;
public abstract class FileAccessor
{
String fileName;
Scanner scan;
public FileAccessor(String f) throws IOException
{
fileName = f;
scan = new Scanner(new FileReader(fileName));
}
public void processFile()
{
while(scan.hasNext())
{
processLine(scan.nextLine());
}
scan.close();
}
protected abstract void processLine(String line);
public void writeToFile(String data, String fileName) throws IOException
{
try (PrintWriter pw = new PrintWriter(fileName)) {
pw.print(data);
}
}
}
Class myWordPercentages.java
package wordpercentagesdriver;
import java.io.*;
import java.util.Scanner;
//myWordPercentages class definition which inherits from fileAccessor
public class myWordPercentages extends FileAccessor
{
//local variables declerations
int[] mylength = new int[16];
double[] mypercentages = new double[16];
int mytotalWords = 0;
double myaverage = 0.0;
//class constructor
public myWordPercentages(String myS)throws IOException
{
super(myS);
}
//processig lines of the input file
public void processLine(String file)
{
super.fileName=file;
while(super.scan.hasNext())
{
//updating total words count
mytotalWords+=1;
//reading next word from the file
String myS = super.scan.next();
//calculating the length of the words given
if (myS.length() < 15)
{
mylength[myS.length()]+=1;
}
//If word length greater than 15
else if(myS.length() >=15)
{
mylength[15]+=1;
}
}
}
//words percentage calculation functions.
public double[] getWordPercentages()
{
for(int j = 1; j < mypercentages.length; j++)
{
mypercentages[j] += mylength[j];
mypercentages[j]=(mypercentages[j]/mytotalWords)*100;
}
return mypercentages;
}
public double getAvgWordLength()
{
for(int j = 1; j<(mypercentages.length); j++)
{
myaverage+=((j*(mypercentages[j])/mytotalWords));
}
return myaverage;
}
}
Class WordPercentagesDriver.java
package wordpercentagesdriver;
import java.io.*;
import java.util.Scanner;
import java.io.IOException;
public class WordPercentagesDriver
{
public static void main(String[] args) throws IOException
{
try
{
String fileName;
Scanner scan = new Scanner(System.in);
System.out.println("Enter a text file name to analyze:");
fileName = scan.nextLine();
System.out.println("Analyzed text: " + fileName);
myWordPercentages wp = new myWordPercentages(fileName);
wp.processFile();
double [] results = wp.getWordPercentages();
printWordSizePercentages(results);
System.out.printf("average word length: %4.2f",wp.getAvgWordLength());
}
catch(Exception e)
{
System.out.println(e);
}
}
public static void printWordSizePercentages(double[] data)
{
for(int i = 1; i < data.length; i++)
if (i==data.length-1)
System.out.printf("words of length " + (i) + " or greater: %4.2f%% ",data[i]);
else
System.out.printf("words of length " + (i) + ": %4.2f%% ",data[i]);
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.