Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

STATISTICS PROBABILITY AND CODE IN PYTHON TO PLOT THE GRAPH. Consider the follow

ID: 3750085 • Letter: S

Question

STATISTICS PROBABILITY AND CODE IN PYTHON TO PLOT THE GRAPH.

Consider the following Gross Domestic Product (GDP) data for the US in trillions of UD dollars (real US GDP. i.e., not nominal GDP) Year 1930 1940 1950 1960 1970 1980 1990 20002010 GDP 1.015 1.33 2.29 3.26 4.951 6.759 9.366 13.131 15.599 1. Plot the GDP as a function of the year in as shown below PYTHON Real US GDP (in trillions) 16 14 12 10 8 4 1930 1940 1950 1960 1970 1980 1990 2000 2010 Year Figure 1: Real US GDP as a function of the year 2. Based on the obtained plot, what is a possible mathematical relationship between GDP and Year? Carefully argue this point and see if you can learn to discuss this point within a markdown cell of Jupyter using mathematical notation 3. Plot another curve that follows the mathematical relationship you have derived/argued between GDP and Year. This plot should be overlaid on top of the previous plot obtained using the actual GDP data.

Explanation / Answer

First creating the data frame woth the help of below code in Pandas in Python:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame(columns = ["year", "gdp"])

year = [1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [1.015, 1.33, 2.29, 3.26, 4.951, 6.759, 9.366, 13.131, 15.599]

df["year"] = year
df["gdp"] = gdp

Dataframe looks like:

1.) Plotting the above graph:

plt.plot( 'year', 'gdp', data=df, linestyle='-', marker='o')

plt.xlabel("Year")

plt.ylabel("GDP")

plt.title("Real US GDP(in trillions)")

plt.show()

2.)

Finding Mathematical relationship between year and gdp

Let's Check Correlation between Year and GDP first to check if there exists any possible linear relationship.

df["year"].corr(df["gdp"])

O/P = 0.96

The correlation is very high showing that it we can find and plot the linear relation between these 2 variables.

Let's fit Linear regression model on above data in python.

from sklearn import linear_model

model = linear_model.LinearRegression(normalize= True)

model.fit(df[["year"]], df["gdp"])

Above i am taking only absolute values.

Checking coefficient and intercept to build the equation

model.intercept_

model.coef_

So, Linear equation becomes:

GDP = 0.1865 * YEAr - 359.319

Now, predicting the values of gdp with the help of above model

df["gdp_predicted"] = model.predict(df[["year"]])

Now checking the Coefficient of determination,

from sklearn.metrics import r2_score
r2_score(df["gdp"], df["gdp_predicted"])

The Coefficeint of determination is very high indicating that we are able to capture large amount of variation with the hep of simple linear model only.

Checking for Root Mean Squared Error:

from sklearn.metrics import mean_squared_error
rms = np.sqrt(mean_squared_error(df["gdp"], df["gdp_predicted"]))
print(rms)

1.19

The error is also very less indicating that model predicts very close to actual values.

3.)

Let's draw our linear graph on top of our original graph

plt.plot( 'year', 'gdp', data=df, linestyle='-', marker='o', label = "original")
plt.plot( 'year', 'gdp_predicted', data=df, linestyle='-', marker='o', label="predicted")
plt.xlabel("Year")
plt.ylabel("GDP")
plt.title("Real US GDP(in trillions)")
plt.legend()
plt.show()

The above plot shows that there exists some error in our predicted values, but if we are going to predict exactly same values as orginal, there are high chances of overfitting. As of now our model has very good accuracy.

year gdp 0 1930 1.015 1 1940 1.330 2 1950 2.290 3 1960 3.260 4 1970 4.951 5 1980 6.759 6 1990 9.366 7 2000 13.131 8 2010 15.599