Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. Work with Census Data: Dowload the following file: http://www.census.gov/popu

ID: 663174 • Letter: 1

Question

1. Work with Census Data: Dowload the following file:

http://www.census.gov/population/www/censusdata/files/urpop0090.txt

The census _le is a text _le with data for the 10-year census from 1900 to 1990 (e.g.,

1900, 1910, 1920, . . . ). It has population data for each state as well as regional

and overall data. Each state is on its own line, but the data are grouped so that

only three decades of data are on each line complicating the task of extracting

the data. In addition, the data are further broken down into total, urban, rural,

and percentages. Write a program that for any census year input (e.g., 1970)

the program will print the state and its total population with the minimum and

maximum. For example:

Enter census year 1900 to 1990: 1970

Minimum: (302853, 'Alaska')

Maximum: (19971069, 'California')

The output is displayed as a tuple as a hint to assist with solving the problem

rather than illustrating readable output. Some points to consider:

(a) Begin by generating clean data: there are footnotes that need to be elim-

inated, numbers contain commas, some rows (lines) have data you are not

interested in (e.g., region data), you are not interested in all columns (e.g.,

percentages), and so on. Simply printing the lines with extraneous data re-

moved is a good start.

(b) You will likely want to combine multiple state name strings into one, e.g.,

New" York" becomes New York."

(c) A tuple (population, state) provides a way to tag population data with a state

in a way that allows a list of tuples to be sorted (remember that by default,

sorting uses the first value).

Explanation / Answer

f = open('urpop0090.txt','r')
lines = []

def second(n,k):
   for i in range(0,len(lines)):
       if (lines[i][0:9] == "Northeast"):
           if (n == 3):
               for x in range(64):
                   s = ""
                   for j in range(len(lines[i])):
                       if (ord(lines[i][j]) >= 48 and ord(lines[i][j]) <= 57):
                           lines[i] = lines[i][j:].split()
                           lines[i].insert(0,s)
                           break
                       else:
                           s = s + lines[i][j]
                   i += 1
               ind = 1
               while (True):
                   if (k == 2):
                       mini = 1000000000
                       maxi = -1000000000
                       mini_c = ""
                       maxi_c = ""
                       for x in range(64):
                           st = lines[i-x-1][ind]
                           index = st.find(',')
                           st = st[0:index]+st[index+1:]
                           if (int(st) > maxi):
                               maxi = int(st)
                               maxi_c = lines[i-x-1][0]
                           elif (int(st) < mini):
                               mini = int(st)
                               mini_c = lines[i-x-1][0]
                       print "Minimum: ( "+str(mini)+", "+mini_c+")"
                       print "Maximum: ( "+str(maxi)+", "+maxi_c+")"
                       return
                   else:
                       ind += 6
                       k = k + 1
           else:
               n = n + 1;

def first(n,k):
   for line in f:
       l = line.strip()
       l = " ".join(l.split())
       lines.append(l)  
   second(n,k)

def main():
   while (True):
       print "Enter census year 1900 to 1990: ",
       i = int(raw_input())
       if (i % 10 != 0):
           continue
       else:
           i = i /10
           first((i%10+2)/3,(i%10+2)%3)
           break

main()