The data file can be downloaded from: https://drive.google.com/file/d/0B6i_JzkQ2
ID: 3296178 • Letter: T
Question
The data file can be downloaded from:
https://drive.google.com/file/d/0B6i_JzkQ2f6IeGRlMjhhTVlRbFU/view?usp=sharing
It is well known that the concentration of cholesterol in blood serum increases with age but it is less clear whether cholesterol level is also associated with body weight. The table below shows serum cholesterol (millimoles per litre), age (years) and body mass index (weight divided by height squared, where weight was measured in kilograms and height in meters). The dataset is Using R software: (g) Find a 98% confidence interval for the mean cholesterol level for 49 year olds with a body mass index of 31.7. Explain the meaning of a confidence interval. How do your prediction and confidence intervals compare? (h) Compute the correlation between the two regressors. Comment on the strength and direction of the correlation. (i) Calculate the partial correlation of each regressor with the dependent variable. Explain what these partial correlations mean. (j) Based on the results from (h) and (i), which regressors are likely to be important and in predicting Cholesterol?Explanation / Answer
All R code is shown in bold and the output of the R code is shoen in italics.
(g)
Run the regression on the given data. The data is stored in a dataframe "dataset"
model = lm(Cholestrol~Age+Body.Mass,data = dataset)
Load the new data for prediction.
newdata = data.frame(Age=49,Body.Mass=31.7)
Caluclate the prediction interval
predict.lm(model, newdata, interval="predict",level = 0.98)
fit lwr upr
1 7.392746 5.201232 9.584261
The 95% prediction interval of the cholestrol level with the given parameters is between 5.201 and 9.584.
Calculate the 98% confidence interval
predict.lm(model, newdata, interval="confidence",level = 0.98)
fit lwr upr
1 7.392746 6.463524 8.321969
The 95% confidence interval of the cholestrol level with the given parameters is between 6.464 and 8.322
Confidence interval - If you sample data many times, and calculate a confidence interval of the mean from each sample, you'd expect about 95 % of those intervals to include the true value of the population mean. The key point is that the confidence interval tells you about the likely location of the true population parameter.
Prediction interval - If you sample data many times, and calculate the prediction interval from each sample, you'd expect that next value to lie within that prediction interval in 95% of the samples.The key point is that the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean.
Prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval.
(h)
The correlation between Age and Body Mass is calculated as,
with(dataset, cor(Age,Body.Mass))
[1] 0.3195348
There is a positive and average correlation between Age and Body Mass.
(i)
We will be using the function pcor() function from ggm libraray to calculate the partial correlations.
Download and install the "ggm" package.
install.packages("ggm")
library(ggm)
The partial correlation between Cholestrol and Age (controlling the Body.Mass) is 0.4086
pcor(c("Cholestrol","Age","Body.Mass"),var(dataset))
[1] 0.4086674
The partial correlation between Cholestrol and Body.Mass (controlling the Age ) is 0.68926
pcor(c("Cholestrol","Age","Body.Mass"),var(dataset))
[1] 0.68926
Partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables whilst controlling for the effect of one or more other continuous variables (also known as 'covariates' or 'control' variables).
j)
As the partial correlation between Cholestrol between Body.Mass is high, Body.Mass is likely to be important in predicting Cholestrol.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.