For this assignment, you are to write a Python program that will search the web
ID: 3706915 • Letter: F
Question
For this assignment, you are to write a Python program that will search the web for email addresses. You should write a function called _crawl(start, limit) that will return a list of all the unique email addresses found on all the webpages reachable from the webpage whose URL is start , except that you should stop if you have visited limit distinct web pages and return the list from that many pages. Types: start is a string. limit is an integer. You will return a list of strings (one string for each email address you find). Y
Remember to make your function work, and to use the email address finder that you need to add the following two lines to the start of your file:
import urllib.request as ur
import re
so it is find all url first, and go to contents, and find all unique address. there will be a 3 functions.
Explanation / Answer
import urllib, re
def _crawl(start, limit):
uniquemails = set()
#connect to url
website = urllib.urlopen(start)
#read html code
html = website.read()
#use re.findall to get all the links on this webpage
links = re.findall('"((http|ftp)s?://.*?)"', html)
for link in links:
w = urllib.urlopen(link)
str = f.read()
#use re.findall to get the emails on this webpage
emails = re.findall(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,4}",str)
for email in emails:
uniquemails.add(email)
print uniquemails
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.