[JAVA] The input file and output file will both be .txt Any help is appreciated.

ID: 3715676 • Letter: #

Question

[JAVA]

The input file and output file will both be .txt

Any help is appreciated. Please comment so I can understand what is happening. Thanks in advance, I will rate any help.

Objective: Collections Create a concordance (or an index) on a large file by implementing the following steps: 1. The user will specify 2 file names as command-line arguments. If the wrong number of arguments is passed, a message telling the user the correct usage is displayed and the program ends. The first argument is the file to be read in from, and the second argument is the output file. 2. Create a concordance of the first file and store that concordance into the second file. Your concordance should be created in the following way: a) Read in all "words" - To keep this relatively simple, let a word be any consecutive sequence of letters, digits, hyphens or apostrophes. Everything that is not a letter digit, hyphen or an apostrophe is a delimiter. For example, the word don't is one word; whereas my e-mail address thomasg@uwosh.edu is considered 3 words - namely thomasg, uwosh and edu. You must store the words, in all lower case, into an appropriate class in the Collections Framework. Note that you must be case insensitive when considering words (already and Already are the same word). You may need to remove empty strings when tokenizing with regular expressions. b) Along with reading in all of the words, remember which "page" the word was on. A word that is one of the first 500 words to be read in is considered to be on page 1. A word that is one of the second 500 words to be read in is considered to be on page 2. And so on. Pages are numbered from the beginning of the book always. c) After you have read in all of the words, you should write out your concordance to the second file. You should write out each word that appears in the first file and for each word that you write out, you must also write what page(s) that word appears on. You must write out the words in alphabetical order and the page numbers for each word in ascending order (use an appropriate collection to "order" the elements). Note that if a word appears multiple times on the same page, it should only be listed once for that page number. A sample output (note that words and page numbers are not "correct" in this snippet; this is simply to demonstrate the format) should follow this format exactly: already [2200, 2284, 2960, 2961] alsatian [1156, 1158] also [539)]

Explanation / Answer

import java.io.File;

import java.util.regex.*;

import java.util.Map;

import java.util.List;

import java.util.HashMap;

import java.util.TreeMap;

import java.util.Set;

import java.util.TreeSet;

import java.util.ArrayList;

import java.io.*;

public class Concordance{

public static void main(String []args)throws FileNotFoundException,IOException{

//Do some validation, if the File Exists or not

if(args.length!=2)

{

System.out.println("Invalid Number of arguments");

}

else

{

File f1 = new File(args[0]);

if(!f1.exists()) {

System.out.println("File 1 not found");

}

else

{

//File Found

//Read the File content

FileInputStream fis = new FileInputStream(f1);

byte[] data = new byte[(int) f1.length()];

fis.read(data);

fis.close();

String content = new String(data, "UTF-8");

//Lower Case the content tofor case insensitivity

content = content.toLowerCase();

//Split the String into Words

Pattern p = Pattern.compile("[^0-9a-zA-Z\-\']+");

String[] words = p.split(content);

Integer wordCount = 0;

Integer pageCount = 1;

Map<String, ArrayList<Integer>> indexes = new HashMap<String, ArrayList<Integer>> ();

//Loop through each word

for(String word: words)

{

wordCount++;

if (wordCount % 20 == 0)

pageCount++;

if (!indexes.containsKey(word))

indexes.put(word, new ArrayList<Integer>());

List<Integer> index = indexes.get(word);

if (!index.contains(pageCount))

index.add(pageCount);

}

// TreeMap to store values of HashMap

TreeMap<String, ArrayList<Integer>> sorted = new TreeMap<>();

// Copy all data from hashMap into TreeMap

sorted.putAll(indexes);

File fout = new File(args[1]);

FileOutputStream fos = new FileOutputStream(fout);

BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos));

bw.close();

for(Map.Entry<String, ArrayList<Integer>> index: sorted.entrySet())

{

StringBuilder pages = new StringBuilder();

pages.append(index.getKey());

pages.append("[");

for(Integer page : index.getValue()){

pages.append(page);

pages.append(", ");

}

pages.append("]");

String pageValue = pages.toString();

bw.write(pageValue);

bw.newLine();

}

Navigate

[JAVA] The first two are to rewrite methds in Java, the 3rd one deals with heap

[JAVA] Write a program to perform statistical analysis of scores for a class of

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

[JAVA] The input file and output file will both be .txt Any help is appreciated.

Question

Explanation / Answer

Related Questions

Navigate