Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

This question should be written in Python. Write a function getWebInfo( ) that t

ID: 3605770 • Letter: T

Question

This question should be written in Python.

Write a function getWebInfo( ) that takes as input a URL and calls three functions, to print the following information:

1. The set of all absolute links already in the page, that is links that start with 'http://'. Must use HTML parser class methods. Do not copy code from the book, which is using url join to *make* every link absolute.

2. A set that contains all e-mail addresses appearing in the page. Must use regular expressions to detect e-mail addresses on the web page. Must remove duplicates. E-mails should be matching general e-mails, not just depaul.edu emails; do not use cdm.depaul.edu in your pattern. Your program should work for any e-mail address on any web page.

3.A list of tuples (derived from a dictionary)that contains the 20most frequent words and their frequencies, in order of frequency. Words must contain 5or more characters. Discard any words of 4characters or less.(6points)There are several steps to follow on this part.

a) You need to construct a dictionary first, containing words and their frequencies.

b) Then the dictionary has to be reversed.

c) The reversed dictionary has to be sorted. Please note that the sorting method returns a list of tuples.

d) Print the first 20tuples of the list of tuples.

Write one function for each of the three pieces of information, a total of three functions. Then assemble the three function calls(and headings)inside your main function getWebInfo( ).Include the call to getWebInfo at the bottom of your module (file.)

Explanation / Answer

Programming where function getWebInfo( ) calling three functions for displaying three different information:

class AbsolutLink(HTMLParser):

    def handle_starttag(self, tag, attrs):

        if tag.lower() == 'a':

             for attr in attrs:

                 if attr[0]=='href':

                     absolute=urljoin(self.url, attr[1])

                     if absolute[:7]=='http://':

                         self.https.append(absolute)

clss EmailAdd(url)

class Wordcount(url)

def getWebInfo(url):

    infile=urlopen(url)

    content=infile.read().decode()

    infile.close()

    https=[]

    parser=AbsolutLink ()

    parser.https = https

    parser.url = url

    parser.feed(content)

    print('ALL ABSOLUTE LINKS ON THE WEB PAGE')

    return https

links = getWebInfo('https://website/page ')

for link in links:

    print(link)

EmailAdd(url)

Wordcount(url)

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote