Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I have a .xml file below, I want to parse it in pyspark (Spark using python) so

ID: 3730115 • Letter: I

Question

I have a .xml file below, I want to parse it in pyspark (Spark using python) so that I can count the number of Id in this file. For example, the one below will output number of id = 3 after counting. I need the parser so that I can output a file that contain all Id content. For example, the output file will be:

7

8

9

Can someone help me please?

<?xml version="1.0" encoding="utf-8"?>
<posthistory>
<row Id="7" PostHistoryTypeId="2" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple &quot;Hello World&quot; example - how can I avoid hard-coding behavior?&#xD;&#xA;&#xD;&#xA;For example, if I wanted to &quot;teach&quot; a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.&#xD;&#xA;&#xD;&#xA;Obviously, randomly generating code would be impractical, so how could I do this?" />
<row Id="8" PostHistoryTypeId="1" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="How can I do simple machine learning without hard-coding behavior?" />
<row Id="9" PostHistoryTypeId="3" PostId="5" RevisionGUID="009bca93-fce2-44ed-a277-a8452650a627" CreationDate="2014-05-13T23:58:30.457" UserId="5" Text="&lt;machine-learning&gt;" />

Explanation / Answer

Method One

---------------

We can import this data by reading from a file:

print(id) //Finally printing id

Method Two

--------------

#Python code to illustrate parsing of XML files

# importing the required modules

import csv

import requests

import xml.etree.ElementTree as ET

def parseXML(xmlfile):

# create element tree object

    tree = ET.parse(xmlfile)

    # get root element

    root = tree.getroot()

    # create empty list for news items

    newsitems = []

    # iterate news items

    for item in root.findall('row'):

        # empty news dictionary

        news = {}

# iterate child elements of item

        for child in item:

news['Id'] = child .find('id').text

        # append news dictionary to news items list

        newsitems.append(news)

    # return news items list

    return newsitems

def main():

# parse xml file

    newsitems = parseXML('topnewsfeed.xml')

if __name__ == "__main__":

# calling main function

    main()