Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Your assignment is to develop an interactive Java-based system that gets queries

ID: 3860386 • Letter: Y

Question

Your assignment is to develop an interactive Java-based system that gets queries with k terms, where each term is marked by either a plus sign or a minus sign. Additionally, the system has access to a. CSV file that contains the results of the last indexing and crawling performed by the search-engine crawler. Ignoring computational complexity, design and implement (in Java, Java Collections, Java GUI, and Javadoc) the above system under the following spec. The user can insert up to three terms via an input widget, Along, with each term the user has to denote whether it is a plus-term or a minus-term (the plus and minus signs are entered via a widget such as a radio button and not as prefixes). The user hits a "Search" button (you do not have to implement this button) and then: The system checks if the terms entered are in the crawler list of inverted indexes. a. If any of the terms is not in the list, the system responds with "Invalid Terms." b. If the terms are in the list, the system responds with the first up-to 5 matching page-ids. c. If the result of the query has valid terms but does not match any page that complies with the constraints, the system responds with "No Match." You can, use any collection[s] in your solution. There is no requirement to use a hash table and it is not necessarily the best data structure for storing the list of inverted indexes. Carefully document your system using Javadoc. Note that the documentation will help me understand your system: hence, it will be an essential part of your grade.

Explanation / Answer

Simple java (1.6) crawler to crawl web pages on one and same domain. If your page is redirected to another domain, that page is not picked up EXCEPT if it is the first URL that is tested. Basicly you can do this:

A simple crawl have the following options, and will output the url:s crawled to system out. Note, only urls that returns 200 will be outputted by default:

You can choose to output the crawled list to two plain text files, one with working urls, and one with the none working:

You can choose to output the result in a csv file, and separate the urls by working and non working:

Crawl and output urls that contains specific keyword in the html

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote