Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In java Description: You will build an application that provides users with a gu

ID: 3724561 • Letter: I

Question

In java

Description:

You will build an application that provides users with a guided web browsing capability that searches files for user-specified keywords.

A typical search engine reads many files off the web and saves information about them in a database that is used to answer the search queries posed by users. However, your application will not do any prefetching of data. Instead, it will search files in response to user requests, as described in the specification given below.

Specification:

User enters the specific URL such as http://www.bbc.com/ and search word to search for on the command line.

The program opens an URLConnection for the given URL. Your program should parse the file in order to display the following information:

The number of occurrences of the user-specified word in the HTML file

The URLs for all the links to other HTML files that are given in the user-selected file (things of the form href="xxxxx"), along with the number of occurances of he keyword in each. To do this, open a URL connectin for each of HTML links and parse the file, counting the number of times the keywird ouccurs. You have to display all the URLs that were parsed, sorted by the number of occurrences of the keyword, in decreasing order, omitting files that don't contain the keyword at all. For each URL, displau the URL for the file, followed by the number of occurrences of the keyword in parentheses.

After each search, use fileOutputStream to save the result of the application to a file called "searchdata.www", overwriting the data from the previous search.

Assumptions:

All of the actual URL files will end with the “.html” suffix. However, the link names may not show the suffix explictly. If the link ends with a “/”, append the string “index.html” before processing. If a link does not end with a “/” and also does not end with “.html”, append the string “/index.html” before processing.

If you are currently looking at a page whose URL is http://www.example.com/abc/nonsense.html, then the path http://www.example.com/abc/ is considered to be currently directory URL.

If a link does not begin with “http://” then it is a relative link, meaning that you should prefix it with the current directory URL before processing. For more information regarding absoulte/relative link, please refer to http://www.scriptingok.com/tutorial/HTML-links-2

Explanation / Answer

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStreamReader;

import java.net.HttpURLConnection;

import java.net.URL;

import java.util.ArrayList;

import java.util.Collections;

import java.util.List;

import java.util.Scanner;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class GuidedBrowsing {

static class WebpageWordCount implements Comparable<WebpageWordCount> {

String url;

int count;

WebpageWordCount(String u, int count){

url = u;

this.count = count;

}

public String toString() {

return url + " (" + count + ")";

}

@Override

public int compareTo(WebpageWordCount o) {

if (count == o.count)

return 0;

return count > o.count ? -1 : 1;

}

}

public static void main(String[] args) throws IOException {

Scanner sc = new Scanner(System.in);

System.out.println("Enter specific Url:");

String stringUrl = sc.nextLine();

System.out.println("Enter keyword to search:");

String keyword = sc.next();

URL url = new URL(stringUrl);

StringBuffer mainPageRespone = getResponse(url);

List<String> allUrls = getAllLinks(url.toString(), mainPageRespone.toString());

List<WebpageWordCount> result = new ArrayList<WebpageWordCount>();

result.add(new WebpageWordCount(url.toString(), countOccurrence(mainPageRespone, keyword)));

for(String surl: allUrls) {

URL purl = new URL(surl);

StringBuffer resp = getResponse(purl);

int frequency = countOccurrence(resp, keyword);

result.add(new WebpageWordCount(surl, frequency));

}

Collections.sort(result);

for(WebpageWordCount w: result) {

System.out.println(w.toString());

}

sc.close();

}

static List<String> getAllLinks(String mainUrl, String pageResponse){

List<String> urlFounds = new ArrayList<String>();

String urlRegex = "(http://|/)[^ :]+";

Pattern p = Pattern.compile(urlRegex);

Matcher matcher = p.matcher(pageResponse);

while(matcher.find()) {

String url = matcher.group();

if(url.startsWith("/")) {

urlFounds.add(mainUrl + url);

}

else {

urlFounds.add(url);

}

}

return urlFounds;

}

static private StringBuffer getResponse(URL url) throws IOException {

StringBuffer response = new StringBuffer();

HttpURLConnection con = (HttpURLConnection) url.openConnection();

int responseCode = con.getResponseCode();

if (responseCode == HttpURLConnection.HTTP_OK) {

BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));

String inputLine;

while ((inputLine = in.readLine()) != null) {

response.append(inputLine);

}

in.close();

} else {

System.out.println("GET request not worked for Url:" + url);

}

return response;

}

private static int countOccurrence(StringBuffer buffer, String keyword) {

int count = 0;

Pattern c = Pattern.compile("sagar");

Matcher m = c.matcher(buffer.toString());

while(m.find()) {

count++;

}

return count;

}

}

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote