Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Question: Write a method called stripHtmlTags that accepts a Scanner representin

ID: 3794861 • Letter: Q

Question

Question: Write a method called stripHtmlTags that accepts a Scanner representing an input file containing an HTML web page as its parameter, then reads that file and prints the file's text with all HTML tags removed. A tag is any text between the characters < and >. For example, consider the following text:

<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>


Here's my cat now:<img src="cat.jpg">
</body>
</html>

If the file contained these lines, your program should output the following text:

My web page


There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.


Here's my cat now:

You may assume that the file is a well-formed HTML document and that are no < or > characters inside tags.

instruction:

This problem will require you to process lines from the file. You must continue until all lines have been read from the file. Use the “Scanner” class methods hasNextLine() and nextLine() to extract the lines as String objects. You can then use class String methods to search the line for the HTML tag pair characters < and >. I suggest you consider methods length(), indexOf(char), and substring(int,int) in a while loop to process each line extracted from the file. You will need to accommodate lines with multiple tags, including lines with only tags. For each line processed, output a line, even if there are only tag(s) on the line, in that case, the line is a “blank” line. Refer to the attached sample files with the converted results as a guide. The input files must be located in the directory C:CS210DataFiles. Be aware, for String objects, the value returned by the non-static String method length() is the index of the last character in the String object. Finally, you are not to utilize the String class method replaceAll() in your solution.

You must define a method to process the input file.

Explanation / Answer

public void stripHtmlTags(Scanner sc)
   {
       while(sc.hasNextLine())
       {
           String line = sc.nextLine();
           line = line.replaceAll("<[^>]*>", "");
           if (! line.isEmpty())
               System.out.println(line);
       }
   }

// here is program to test this

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.net.HttpCookie;
import java.util.Scanner;


public class HTMLProcessor {

   public void stripHtmlTags(Scanner sc)
   {
       while(sc.hasNextLine())
       {
           String line = sc.nextLine();
           line = line.replaceAll("<[^>]*>", "");
           if (! line.isEmpty())
               System.out.println(line);
       }
   }
  
   public static void main(String[] args) throws FileNotFoundException
   {
       HTMLProcessor h = new HTMLProcessor();
       Scanner sc = new Scanner(new FileReader("test.html"));
       h.stripHtmlTags(sc);
   }
  
}

/*

Sample run with content provided

My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:

*/

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote