Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

class NationalSite: def __init__(self, type, name, desc, url=None): self.type =

ID: 3729991 • Letter: C

Question

class NationalSite:
def __init__(self, type, name, desc, url=None):
self.type = type
self.name = name
self.description = desc
self.url = url
#each object will have a information on one site
#get the site url and then scrape data and save
# needs to be changed, obvi.
self.address_street = '123 Main St.'
self.address_city = 'Smallville'
self.address_state = 'KS'
self.address_zip = '11111'

This question is about scraping the nps.gov website by crawling starting at https://www.nps.gov/index.htm. I need help writing the function get_sites_for_state(state_abbr) that takes a state abbreviation and returns a list of NationalSites that are in that state. I included a dictionary with the state abbreviations and called it states{}.

def get_sites_for_state(state_abbr):

menu = soup.find('ul', class_='dropdown-menu SearchBar-keywordSearch')

find_it = menu.find('a')['href']

page_link= make_request_using_cache (baseurl+find_it)

listofstate = page_link.find('div', id='parkListResultsArea')

states = listofstate.find('ul', id='list_parks')

At the basic level, each NationalSite (instance) should be created with a name, type (e.g., ‘National Park,’ ‘National Monument’, ‘National Historic Site’), and description. All of these can be found on the landing page for a particular state. Implement basic searching by state and creation of NationalSites with name and type.

Implement adding address information to NationalSites by crawling. In addition, you should visit the detail page for each site to extract additional information--in particular the physical address of the site. To do this, you will have to crawl one level deeper into the site, and extract information from the site-specific pages (e.g., https://www.nps.gov/isro/index.htm).

How do I get from one website link to the next by crawling and create instances? Any help would be appreciated.

Explanation / Answer

Let say you have one page and that Is index

require('simple_html_dom.php'); get it from the internet and download the library.

now let scrap from the below URL and create nice array of links and name it we need to grab this page first.

create a DOM from URL or file

$html = file_get_html('https://www.nps.gov/isro/index.htm');

A PHP object was just created with the NPS page structure.

Look at the NPS page structure to find a repeating structure for a list element

// creating an array of elements

$address = [];

$i = 1;

foreach ($html->find('<list CSS class name>') as $address) {

        if ($i > 10) {

                break;

        }

// Find item link element

        $addressdetail = $address->find('<CSS Class name>, 0);

// get title attribute

        $streetname = $addressdetailDetails->streetname;

// push to a list of address

        $address[] = [

                'title' => $addressTitle,

                'url' => $addressUrl

        ];

$i++;

}

var_dump($address);

Based on the array you get the same way you can use multile attributes to have it in array and create an instances.