Write a program in Java that checks whether a sequence of HTML tags is properly
ID: 3814921 • Letter: W
Question
Write a program in Java that checks whether a sequence of HTML tags is properly nested. For each opening tag, such as <p>, there must be a closing tag </p>. A tag such as <p> may have other tags inside, for example <p> <ul> <li> </li> </ul> <a> </a> </p>. The inner tags must be closed before the outer ones. The program should process a file containing tags. For simplicity, assume that the tags are separated by spaces, and that there is no text inside the tags.
Explanation / Answer
HtmlParser.java without using stack and Patter and Matcher i.e., done using LinkedList
import java.util.LinkedList;
import java.util.List;
public class HtmlParser {
public static boolean isBalancedHtml(String html) {
List<String> tagList = new LinkedList<>();
String lines[] = html.split(" ");
for (int i = 0; i < lines.length; i++) {
String line = lines[i];
while (line.indexOf("<") != -1) {
int startIndex = line.indexOf("<");
if (startIndex == -1) {
break;
}
boolean isClosingTag = line.charAt(startIndex + 1) == '/';
int endIndex = line.indexOf(">");
if (endIndex == -1 || endIndex > line.length()) {
break;
}
String tag = line.substring(startIndex+1, endIndex);
String tagName = "";
if (isClosingTag)
{
tagName = line.substring(startIndex+2, endIndex);
}
else
{
tagName = tag.split(" ")[0];
}
if (isClosingTag) {
String lastTagInList = tagList.get(tagList.size()-1);
if (!lastTagInList.equals(tagName)) {
return false;
}
tagList.remove(tagList.size()-1);
} else {
tagList.add(tagName);
}
line = line.substring(endIndex + 1);
}
}
return tagList.isEmpty();
}
public static void main(String[] args) {
String htmlContent = "<img title="displays" src="big.gif"></img><p> <ul> <li> </li> </ul> <a> </a> </p>";
System.out.println("Html content: " + htmlContent);
if (isBalancedHtml(htmlContent)) {
System.out.println("HTML conent is balanced");
} else {
System.out.println("Html content is not balanced");
}
}
}
-------------------------------------------------------------------------------------
HtmlParser.java using stack and regula expression (Pattern, Matcher)
import java.util.Stack;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HtmlParser
{
final static Pattern pattern = Pattern.compile("</?(\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[\^'">\s]+))?)+\s*|\s*)/?>");
public static boolean isBalancedHtml(String html)
{
Stack<String> tagStack = new Stack<>();
String lines[] = html.split(" ");
for(int i = 0; i < lines.length; i++)
{
String line = lines[i];
while(line.indexOf("<") != -1)
{
int startIndex = line.indexOf("<");
if (startIndex == -1)
{
break;
}
boolean isClosingTag = line.charAt(startIndex+1) == '/';
int endIndex = line.indexOf(">");
if (endIndex == -1 || endIndex > line.length())
{
break;
}
String tag = line.substring(startIndex, endIndex+1);
Matcher matcher = pattern.matcher(tag);
matcher.find();
String tagName = matcher.group(1);
if (isClosingTag)
{
if (!tagStack.pop().equals(tagName))
{
return false;
}
}
else
{
tagStack.push(tagName);
}
line = line.substring(endIndex+1);
}
}
return tagStack.empty();
}
public static void main(String[] args)
{
String htmlContent = "<img title="displays" src="big.gif"></img><p> <ul> <li> </li> </ul> <a> </a> </p>";
System.out.println("Html content: " + htmlContent);
if (isBalancedHtml(htmlContent))
{
System.out.println("HTML conent is balanced");
}
else
{
System.out.println("Html content is not balanced");
}
}
}
Sample output
Html content: <img title="displays" src="big.gif"></img><p> <ul> <li> </li> </ul> <a> </a> </p>
HTML conent is balanced
Html content: <img title="displays" src="big.gif"></img><p> <ul> <li> </ul> <a> </a> </p>
Html content is not balanced
Please note this will handle simple cases only. Cases like <br /> and similar tags are not handled. Also cases when a tag attribute has '>' '<' charcter is not handled.
Please rate positively if this answered your query.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.