Using Python 2.7 Write a program that gets the root document (\"/\") from www.ch
ID: 3850400 • Letter: U
Question
Using Python 2.7 Write a program that gets the root document ("/") from www.champlain.edu (Links to an external site.)Links to an external site. and extracts every line that has an anchor tag, indicating that it's a hyperlink to another page. The anchor tag will start with <a so you can search for that. Ideally, include a space after the a so you only get the anchor tags.
Create a Sqlite database sending in the following query at the beginning of your program if the database doesn't exist (use the os module to check to see if a file exists).
CREATE TABLE storage(curtime TEXT, line TEXT)
You can use the following query to insert values into the database:
INSERT INTO storage (curtime, line) values ('', line);
With appropriate values for your values. You will need to create a string by concatenating your values before you pass the INSERT statement into the execute method. There are a number of ways to do this, so be sure to look up examples online.. You should be getting the current time each time you enter a value into the database and pass that in for the value of curtime. The following will get you the current time with high precision.
import datetime
current_time = datetime.datetime.now().time()
Explanation / Answer
import urllib, re, datetime, sqlite3
def connect_sqlite(dbname, tablename):
# SQLITE3 settings
conn = sqlite3.connect(dbname)
conn.text_factory = str
# Get cursor
c = conn.cursor()
# Cretae table if not present
try:
c.execute("CREATE TABLE " + tablename + "(curtime TEXT, line TEXT)")
except sqlite3.OperationalError:
pass
return conn, c
def close_sqlite(connection):
connection.close()
def read_hrefs():
url = "http://www.champlain.edu"
link = ""
hrefs_arr = []
# Open url.
url_data = urllib.urlopen(url)
for line in url_data.readlines():
# Search for anchor tags.
m = re.search(r'.*?(<a .*?>.*?</a>).*', line)
try:
link = m.group(1)
current_time = datetime.datetime.now().time()
hrefs_arr.append({ "link" : link, "time" : current_time })
except AttributeError:
# If no match found.
pass
url_data.close()
return hrefs_arr
def insert_into_db(cursor, hrefs_arr, tablename):
for href in hrefs_arr:
cursor.execute("""INSERT INTO %s (curtime, line) values (?, ?)"""%tablename, (href["link"], str(href["time"])))
# ==== MAIN ====
DB = 'example.db'
TABLE = 'storage'
all_anchors = read_hrefs()
connection, cur = connect_sqlite(DB, TABLE)
insert_into_db(cur, all_anchors, TABLE)
close_sqlite(connection)
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.