Write a spell checking program which uses a dictionary of words (input by the us

ID: 3535292 • Letter: W

Question

Write a spell checking program which uses a dictionary of words (input by the user as a string) to find misspelled words in a second string, the test string. Your program should prompt the user for the input string and the dictionary string. A valid dictionary string contains an alphabetized list of words.

Functional requirements:

Explanation / Answer

CIS 120 HW 9: SpellChecker All modern word processors, along with increasingly many other programs, have a built in spell check function that allows even the worst typist some level of efficiency. To complete this homework you will be coding a rudimentary spell check program based on a dictionary and list of corrections for misspelled words. A few notes before we begin: In order to do well on this assignment, you must read the Javadocs for each class to be written! When completing each task below, first read the associated Javadoc so you know exactly what behavior the class should provide. The online tester will verify that you read and wrote your code to match the specification listed in the Javadocs. For this project, you have only two free submissions. Subsequent submissions will cost you 5 points. This means you won't be able to use the online testers to find your bugs! Writing your own tests will be critical to doing well. File, class, and method names must be exactly as described in the Javadocs. (You may, of course, add your own helper variables and methods.) You will want to have the Javadocs open while you're coding. The files for this assignment are at: hw09.zip. Setup instructions: In Eclipse, go to File > New > Project and choose Java Project. Hit Next. Use "SpellChecker" as the project name. Under Project layout, check the option for "Create separater folders for sources and class files." Hit Next. Click on the "Libraries" tab, and then "Add Library". Choose JUnit and then JUnit 4. Press Finish. Press Finish again to close the wizard. Right click on the new project SpellChecker in the Package Explorer, and select "Import". Choose File System, and in the "From directory" field, navigate to the folder where you unzipped the files. Check off all of the unzipped files (.java and .txt), and then hit Finish. The files should now be added! You can view them by expanding the "(default package)" entry in the Package Explorer. There is a challenge problem (worth no credit): see the Levenshtein class in the Javadocs. Recall: I/O and Collections This homework is designed for you to gain practice with Java I/O (see java.io) and the Java Collections Framework (see java.util). The I/O package is very large with lots of different classes to choose from -- we recommend taking a look at the following interfaces and classes: Input InputStream Reader (abstract parent class) InputStreamReader FileReader BufferedReader - very helpful!!! Scanner* Output OutputStream PrintStream Writer (abstract parent class) FileWriter Collections Iterable (and Iterator) Collection List Set Map *Scanner is technically in java.util (not java.io) because it implements the Iterable interface. Step 0: A High Level Look At a high level, a user (you) will run the program using SpellCheckerRunner. This creates the dictionary, correctors, opens the relevant files, and initializes a spell checker object which then does the actual spell checking process. In addition to the provided files described above, we also provide several test classes (DictionaryTest, TokenScannerTest, etc.) and sample text files to run your code with. Step 1: Building a Dictionary File to submit: Dictionary.java Before we can actually program the spellchecker we need a class that reads in words from the dictionary (in this case, a dictionary is just a list of 'legal' words, and we do not store any definitions), stores them in memory, and provides efficient access to them. Begin by reading the Javadoc for the Dictionary class, to understand what it does. 1.1 Add Test Cases Once you understand what the Dictionary class does, open DictionaryTest.java. We only provide a few test cases. Passing the given tests doesn't mean your code works for all input! Using the given test cases as a template, add test cases for the following situations. By a test case, we mean that you should make the certain scenario happen, and check that the response matches what should happen, as outlined in the Javadocs. Use methods like assertEquals, assertTrue, assertFalse to verify the correct behavior (Recall that documentation for the assert methods can be found here). Each bullet below should be its own test method: You check for a word that is in the dictionary. You check for a word that is NOT in the dictionary. You ask for the number of words in the dictionary. (Does it return the right thing? Hardcode in the answer you expect based on your test dictionary file.) You check for the empty word "". You check for a word that is all uppercase, but the word appears in the dictionary in lowercase. The dictionary file contains a blank line in the middle. (Remember: You can always make new .txt files to help write test cases.) The dictionary file has words in mixed case (e.g. eLePHant) and you check if one of those words is in the dictionary. The dictionary file has words with whitespace around them, and you check if one of those words is in the dictionary. The same word appears twice in the dictionary file, and you ask for the number of words in the dictionary. Are there any other test situations or corner cases which may occur? We've left out some scenarios from the list above. You should write additional test cases for any other situations you think of. TIP: Don't forget about null! 1.2 Dictionary.java Once you feel confident that your tests cover all aspects of the expected behavior, open up Dictionary.java. We have provided template methods, and you will have to fill in the implementation. You should run your tests, find what fails, and correct the code until your test cases pass. Using the Right Data Structure Choosing the right data structure to store the dictionary is crucial. If the dictionary contains 1000000 words, you don't want to be re-reading and searching through all of those words each time isWord() is called. The implementation of the data structure is also important - different ways of storing data make different operations efficient. Ask yourself: Is a List the right data structure to store the dictionary? What benefits does a list have over a set or map? Are these necessary for a Dictionary? The ArrayList class implements lists using resizeable arrays. The LinkedList class implements lists using linked lists of nodes. Is a Set the right data structure to store the dictionary? How does the set differ from a list or a map? Do the properties of a Set fit the description of a Dictionary? The HashSet class implements sets by computing a key for each entry and keeping an efficient map from keys to possible values. The TreeSet class implements sets by storing elements in a search tree. Is a Map the right data structure to store the dictionary? If so, what kind of map is it? (e.g. String to String, String to int, etc.) The HashMap class implements maps by grouping "buckets" of values together based on a hash of their keys. The TreeMap class implements maps by storing keys in a sorted tree. If your code takes more than two or three seconds to run all your Dictionary tests, it will take even longer to pass our tests, and this risks running into our timeout (note that if such a timeout happens, you will get this output: "Cannot run code because of infinite loop or raised exception."). In other words, you are using the wrong structure! IMPORTANT: You should use BufferedReader and FileReader directly to read in the file. Do not use a Scanner. Some versions of the Java library have bugs that will cause the Scanner to fail when processing long files like our dictionaries. The Java String library has functions like trim to help you format the strings. Tip: If you followed the set up instructions, your files should compile and you should be able to run your JUnit tests. If this isn't working, make sure that your text input files are in the root project folder, not in the src/ folder. Also: Always remember to close your input streams! If you find a test is not running, you may need to delete the file or exit the Java process in order to manually close the stream. It is critical that your Dictionary works properly before moving on! Make sure that the method signatures match those in the Javadoc exactly. Step 2: Adding Correctors Files to submit: FileCorrector.java and SwapCorrector.java Suppose a word doesn't occur in a dictionary. How could we fix it? Corrector is an abstract class where the getCorrections method is abstract. An abstract class has some similarities to an interface; the biggest difference is that abstract classes can have some implemented methods. Here matchCase is implemented, but getCorrections is not. Any subclass of Corrector must implement getCorrections, and thus is able to provide corrections for a given misspelled word. We will implement two different kinds of Correctors: FileCorrector - uses a reference text file containing common misspellings to provide suggestions for a misspelled word. SwapCorrector - uses a Dictionary, and suggests any words from it that are a single "swap" from the given incorrect word. For example, it might suggest "this" if the incorrect word was "thsi". A swap is two adjacent letters that are reversed in their positions. 2.1 Adding Tests, Implementing FileCorrector.java Read the documentation for FileCorrector, and then add the following test cases (plus your own) to FileCorrectorTest.java: The corrections file has extra whitespace around a line, or around the comma. You can (and should) create your own new .txt files to help write these test cases! We ask for a word that has no corrections. We ask for a word that has multiple corrections. The input file has some lines with mixed case (e.g. word,CoRRecTioN or WORD,correction) and you ask for that word's corrections. We ask for a fully uppercase word that has corrections. What case should the results have? We ask for that same word, but now lowercase, that has corrections. What case should the results have? The list above isn't comprehensive, so make sure to add your own additional test cases for any other behavior described in the Javadocs. When you are finished adding test cases, write the FileCorrector class and ensure it passes your tests. Think carefully about which data structure is the best suited for storing the data read from the file. TIP: You may want to look at the indexOf and substring methods of String to help you extract the incorrect and correct words from any given line. TIP: This tip applies to writing both FileCorrector and SwapCorrector. The getCorrections method requires that the suggested corrections have the same case as the given incorrect word. Thus, if "pease" was the incorrect word, we should suggest the set {"peas", "peace"}, but if either "PEASE" or "Pease" or "PeAsE" is the incorrect word, we should suggest back the set {"Peas", "Peace"}. (So, only first letter capitalization is relevant.) To help with the capitalization correction, the abstract class Corrector has a helper method matchCase that FileCorrector and SwapCorrector can both use. Notice how putting this in the parent class allows us to avoid duplicating code in the two child classes. 2.2 Adding Tests, Implementing SwapCorrector.java Repeat the above procedure for implementing SwapCorrector: look at the Javadocs and add necessary test cases, and then implement the class itself. Here are some suggested test cases, but we've left out quite a few: The provided dictionary is null. You ask for corrections for a word that itself is already in the dictionary. You ask for corrections for a word that is mixed case. TIP: Remember that you can create your own dictionary files to help with testing! For example, if you make a test case that asks for corrections for the incorrect word "abcd", you may want to create your own dictionary file that contains "bacd", "acbd", etc and ensure your method doesn't miss any of the swaps! All IOExceptions that may result from the execution of the Correctors do not need to be caught. Read the Javadocs closely for the Correctors and make sure to test all of the possible edge cases! Step 3: Reading Document Tokens File to submit: TokenScanner.java Now that we have the dictionary and two ways to correct words, we can turn our attention to the document itself. The purpose of this class is to read the input file and split it up into tokens as an iterator. There are two sorts of tokens: words and non-words. Word tokens only contain word characters. Nonwords do not contain any word characters. We will make a simplifying assumption and state that a word character is either a letter (see Character.isLetter) or an apostrophe. File Contents Expected Tokens They aren't brown, are they? "They" " " "aren't" " " "brown" ", " "are" " " "they" "?" It's time 2 e-mail! "It's" " " "time" " 2 " "e" "-" "mail" "!" A few things worth pointing out: Green tokens are word tokens, yellow tokens are non-word tokens. Notice how we always alternate between the two types of tokens. The newline character (" ") should not be skipped. The assumption we made about word characters (being only letters and apostrophes) isn't perfect: notice how we split apart the word "e-mail". However, it's a reasonable simplification and our SpellChecker will still work for the majority a document's text. We have provided you with a class called Token that stores (a) a true/false variable indicating whether it is a word token or not and (b) the token string itself. The class TokenScanner, which implements the interface Iterator, must provide the following: hasNext() - Tells the client whether any more tokens remain in the file next() - Returns the next token as a Token object remove() - We won't implement this functionality, so your remove method can just contain: throw new UnsupportedOperationException(); 3.1 TokenScanner.java Read the TokenScanner javadocs, and then open up TokenScannerTest.java and examine the provided test case. As before, you should write some more unit tests for this class to exercise all of the intended behavior, and then write TokenScanner as a means to pass those tests. The constructor is allowed to throw IOExceptions, but the next method should catch them and return null so that it conforms to the Iterator signature. TIP: You may want to look at the Histogram scanner written in class for an example of how to approach this problem. Step 4: Creating the SpellChecker File to submit: SpellChecker.java The end is near! We need to put all of our classes together. A SpellChecker object takes in a Corrector and a Dictionary when constructed. When checkDocument is called, it should read tokens from the input document, spell-check any incorrect words, and write the output to a file using a Writer object. Most of SpellChecker has already been written, except for the method that processes the document and does the checking. Sample Output/Interaction Click here to see a sample of the output and interaction functionality we expect to see when running the finished program. You should closely emulate this output in your SpellChecker class. 4.1 checkDocument in SpellChecker.java Complete the checkDocument method in SpellChecker.java so it matches the functionality above. Bear in mind the following: Read the Javadocs for the SpellChecker class first! :) Non-word tokens should be copied directly from the input document into the newly corrected file. Only word tokens should be checked for spelling and (possibly) replaced. If the user chooses option 1 ("replace with another word"), they can only enter single word replacements. This is taken care of for you by the provided getNextString helper method. You should use the submitted word verbatim; no need to correct the word case. If a word isn't in the dictionary, then we want to check if our Corrector has any suggestions for it. If more than one correction is available, then the possible corrections should be presented to the user in alphabetical order. TIP: To alphabetically sort a string set, use the following: // Suppose setOfCorrections is the Set you got from a Corrector. List sortedOptions = new LinkedList(setOfCorrections); Collections.sort(sortedOptions); // sortedOptions is a sorted list, e.g. {"alpha", "beta"} // You can now use the List as you wish; see the List Javadocs. String first = sortedOptions.get(0); // "alpha" String second = sortedOptions.get(1); // "beta" IOExceptions don't need to be caught, but you should handle invalid user input appropriately. For example, if you give a user four options and they enter an invalid selection (such as 12 or "oaijeoijwef"), you should notify them and prompt for a selection again. TIP: We provide code to help you with this: see getNextInt and getNextString in the partial SpellChecker implementation provided. Your user display and options should match the sample output above. For example, do NOT number your options 1,2,3 instead of 0,1,2. We will test your code automatically with a file of user inputs. Testing: Running in Eclipse The sample test case in SpellCheckerTest uses a file as the "user" input so it can run automatically. However, you can also run the SpellChecker manually in Eclipse. Here's how: Go to Run > Run Configurations. Right click on Java Application and choose "New". Under "Main class", you should put SpellCheckerRunner. Switch from the "Main" tab to the "Arguments" tab. Under "Program arguments:", you can put the inputs to the runner. For example: Note that you must specify the "src/..." part of the path to the files -- the SpellCheckerRunner application is launched from the project folder, not the project's src folder. You can give this configuration a name in the "Name:" bar at the top, so you can identify this from other runs. Hit "Apply" and "Run". The program will run in the "Console" tab, normally at the bottom below your code editors. (Your layout may be different.) You can return to this configuration and change the arguments to run it with different input files, or to switch between different types of Correctors. To edit a configuration, in the Run Configurations panel, select the named configuration under "Java Application" on the left side. To re-run a configuration (after it has been set up), you can choose the configuration (as if you were editing it) and hit "Run". Alternatively, if you have recently run it, you can click the little arrow next to the green button at the top of Eclipse: TIP: When you run the SpellChecker through Eclipse, you will need to refresh the Eclipse package in order to see the newly created output file. Challenge Problem If the drive to create the world's best spell checker has consumed you, you can implement a third type of corrector, Levenshtein.java. This kind of Corrector proposes words that are a Levenshtein distance of 1 away from the incorrect word. This is kudos-only and worth no points.

Navigate

Write a spell checker, that is, a program that reads the words in a piece of tex

Write a spell checking program which uses a dictionary of words (input by the us

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Write a spell checking program which uses a dictionary of words (input by the us

Question

Explanation / Answer

Related Questions

Navigate