Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Fastest possible duplicate detection WARNING: On the big file you (might, depend

ID: 3699390 • Letter: F

Question

Fastest possible duplicate detection

WARNING: On the big file you (might, depending on your machine) have to use the following execution command to ensure enough JVM heap space:

Background

This lab is about pure speed/efficiency. Not only must the ouput be correct, it must use the correct container and use that container most efficiently. Your program must take the name of a huge file of strings on the command line. The output of the program will be either "UNIQUE" or "NOT UNIQUE". No other output of any other kind is expected or allowed except a newline after the word.

you can get the output perfect and get a ZERO on this lab! Just use the wrong container and use it sub optimally.

You must test your program on both the small file and the big one.

The small.txt file produces UNIQUE (and a newline) to the screen because it contains no duplicate words

The big file produces NOT UNIQUE (and a newline) to the screen because it contains a duplicate word

You must:

50% - choose the optimal container ArrayList or TreeSet or HashSet

50% - call only the absolute minimal number of optimal method(s) on the container to solve the problem.

You will not just be graded on the correct answer. You must use the right container AND use it most efficiently.

Your program must use the command line and args[0] to read the input file.

there are a lot of words in the list so ill give you just a few. Please use a container such as ArrayList, Treeset, or hashset. and use minimal number of methods on the container. Thank you

small.txt

big.txt

Explanation / Answer

HashSet will be our choice of container in this case. As it offers O(1) time complexity for add() and remove() methods.

Hash set method add() returns true

i.e. A false value means that set already contains that element hence, current word is duplicate

* ArrayList has O(n) complexity of searching an element and thus finding duplicate.

* TreeSet has O(log(n)) time complexity for add() and remove().

Below code tests uniqueness of words in a file

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;

public class Lab10 {
public static void main(String args[]) {
String file = args[0];
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
HashSet<String> set = new HashSet<>();
boolean unique = true;
while ((line = br.readLine()) != null) {
boolean added = set.add(line);
if (!added) {
unique = false;
break;
}
}
if (unique) {
System.out.println("UNIQUE");
} else {
System.out.println("NOT UNIQUE");
}
} catch(IOException ioe) {
ioe.printStackTrace();
}

}
}