RSS feeds are a popular way to keep track of news items, blog postings and so on
ID: 664867 • Letter: R
Question
RSS feeds are a popular way to keep track of news items, blog postings and so on. For this problem we’ll be working with the news feed from the following URL: http://feeds.nytimes.com/nyt/rss/World
The root of the element of the RSS feed is called RSS which has a child element CHANNEL which has a number of children including the ITEM element. Each ITEM element has a number of child elements such as TITLE, LINK, SOURCE, CATEGORY and so on for each of the news item in the feed. This file is an example of a namespaced file. You can view the page source to see what the XML looks like.
Write a program that reads the content form the URL above using XML methods and prints out the following information:
b. (10 points) Modify your output by adding the geographical regions (usually, but not always, countries) associated with the news item in the following format:
Israel, Egypt, Gaza Strip, West Bank : Op-Ed Contributor: Gaza and Israel: The Road to War, Paved by the West ( By NATHAN THRALL ) China : Sinosphere Blog: Q. and A.: Bill Porter on Journeys, Poets and Best-Sellerdom in China ( By IAN JOHNSON ) Ukraine : Malaysia Airlines Plane Leaves Trail of Debris ( By SABRINA TAVERNISE ) Kabul (Afghanistan), Afghanistan : Afghanistan Begins Audit of Presidential Election ( By MATTHEW ROSENBERG ) Etc.
Entries from the ‘Sinosphere Blog’ may not have properly formatted tags for region, so just put down “China” if they don’t list a proper region tag.
Here is my original code:
#!/usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("news.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("RSS"):
print "Root element : %s" % collection.getAttribute("RSS")
# Get all the movies in the collection
news = collection.getElementsByTagName("item")
# Print detail of each movie.
for item in news:
print "*****News*****"
if item.hasAttribute("title"):
print item.getAttribute("title")
creator = item.getElementsByTagName('creator')[0]
print " : %s" % type.childNodes[0].data
and original output:
Op-Ed Contributor: Gaza and Israel: The Road to War, Paved by the West ( By NATHAN THRALL ) Sinosphere Blog: Q. and A.: Bill Porter on Journeys, Poets and Best-Sellerdom in China ( By IAN JOHNSON ) Malaysia Airlines Plane Leaves Trail of Debris ( By SABRINA TAVERNISE ) Afghanistan Begins Audit of Presidential Election ( By MATTHEW ROSENBERG ) Etc.
Should be in Python
Explanation / Answer
1)
import sys
import requests
from bs4 import BeautifulSoup
request = requests.get('http://feeds.nytimes.com/nyt/rss/World') #getting data from links
soup = BeautifulSoup(request.text)
dataItems = soup.find_all('item')
for dataitem in dataItems:
title = dataitem.find('title').text #printing titles and links
link = dataitem.find('link').text
comments = dataitem.find('comments').text #printing comments
print (title + ' - ' + link + ' - ' + comments) #pritnting all links
2)
#!/usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
# XML Parser
DOMTree = xml.dom.minidom.parse("news.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("RSS"):
print "Root element : %s" % collection.getAttribute("RSS")
# Get all the movies in the collection
news = collection.getElementsByTagName("item")
# Print detail of each movie.
for item in news:
print "*****News*****"
if item.hasAttribute("title"):
print item.getAttribute("title")
creator = item.getElementsByTagName('creator')[0]
print " : %s" % type.childNodes[0].data
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.