Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Create a function scrape that uses a single parameter which contains the website

ID: 3578089 • Letter: C

Question

Create a function

scrape

that uses a single parameter which contains the website

that should be scraped and the

string representation of the regular expression. The

function should use the regular expression to return a list of all the matching

subsections of the website’s contents. Use this function on the website

http://www.lipsum.com

to find all words that start with an

h

but are not included

Create a function

scrape

that uses a single parameter which contains the website

that should be scraped and the

string representation of the regular expression. The

function should use the regular expression to return a list of all the matching

subsections of the website’s contents. Use this function on the website

http://www.lipsum.com

to find all words that start with an

h

but are not included

inside an HTML

Explanation / Answer

PROGRAM CODE:

package sample;

import java.io.FileNotFoundException;

import java.io.IOException;

import java.net.MalformedURLException;

import java.net.URL;

import java.util.ArrayList;

import java.util.Scanner;

public class HtmlReader {

  

public static void main(String[] args) {

   //arraylist to check for html codes

   ArrayList<String> htmlCode = new ArrayList<>();

   htmlCode.add("html");

   htmlCode.add("heig");

   htmlCode.add("http");

   htmlCode.add("html");

   htmlCode.add("href");

String urlName = null;

//url name is given through commandline

urlName = args[0];

Scanner scan = null;

URL webPage = null;

try {

webPage = new URL(urlName);

scan = new Scanner(webPage.openStream());

while (scan.hasNext()) {

   String line = scan.next();

   if(line.charAt(0) == 'h' && !line.contains("</div>"))

   {

       String subLine = line.length() >= 4? line.substring(0, 4) : line;

       if(!htmlCode.contains(subLine))

           System.out.println(line + " ");

   }

      

}

}

catch (MalformedURLException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

} finally {

if (scan != null) scan.close();

scan = null;

  

}

}

}

OUTPUT:

have

has

has

has

here,

here',

have

humour

has

have

humour,

hidden

handful

humour,

help

help

help

hosting

how

human

happiness.

how

him

has

has

harum

hic

hand,

hour,

have

holds

he

he

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote