Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

9. Calculate R 2 from the definition, using the sums of squares, and interpret i

ID: 3227052 • Letter: 9

Question

9. Calculate R 2 from the definition, using the sums of squares, and interpret it.Show that the R 2 value is equal to the square of the Pearson correlation coefficient.

DATA FILE HERE: https://expirebox.com/download/12b419270d0365657d8559cc5c29c7dc.html

MORE INFO:

Fat deposits in the trunk of the body may be more closely linked with bad health outcomes than fat in general.   It can be hard to measure deep abdominal adipose tissue directly.

Despres et al. proposed that waist circumference might be a good predictor variable for deep abdominal adipose tissue. (We’ll see, over the course of our work with these data, that there is an association between these variables, but that we cannot predict precisely enough to use waist circumference as a good measure of an individual’s deep abdominal fat.)

In this dataset, the individuals received CT scans and the area of deep abdominal adipose tissue was measured. Even this isn’t perfect, as it’s an area, the units are cm 2   , and not a volume!

The outcome variable is the area of deep abdominal adipose tissue from the CT scan, the explanatory variable is the individual’s waist circumference in cm.

DATA:

summ waist_circ deep_ab_adipose

    Variable |        Obs        Mean    Std. De

> v.       Min        Max

-------------+----------------------------------

> -----------------------

waist_circ |        109    91.90184    13.5591

> 2       63.5        121

deep_ab_ad~e |        109     101.894    57.2947

> 6      11.44        253

. corr deep_ab_adipose waist_circ

(obs=109)

             | deep_a~e waist_~c

-------------+------------------

deep_ab_ad~e |   1.0000

waist_circ |   0.8186   1.0000

. regr deep_ab_adipose waist_circ

      Source |       SS           df       MS  

>    Number of obs   =       109

-------------+----------------------------------

>    F(1, 107)       =    217.28

       Model | 237548.516         1 237548.516

>    Prob > F        =    0.0000

    Residual | 116981.988       107 1093.28961

>    R-squared       =    0.6700

-------------+----------------------------------

>    Adj R-squared   =    0.6670

       Total | 354530.504       108 3282.68985

>    Root MSE        =    33.065

------------------------------------------------

> ------------------------------

deep_ab_ad~e |     Coef.   Std. Err.      t   

> P>|t|                                        

>           [95% Con                           

>                   f. Interval]

-------------+----------------------------------

> ------------------------------

waist_circ |   3.458859   .2346521    14.74  

> 0.000                                        

>           2.993689                           

>                        3.92403

       _cons | -215.9815   21.79627    -9.91  

> 0.000                                         

>          -259.1901                           

>                      -172.7729

------------------------------------------------

> ------------------------------

twoway (scatter deep_ab_adipose waist_circ) (line ab_adi_fit waist_circ), title(abdominal adipose tissue and waist circumference) subtitle(n = 109)

graph box ab_adi_res, title(standardized residuals)

       subtitle(regression of abdominal adipose tissue and waist circumference)

summ waist_circ, detail

                  waist circumference in cm

------------------------------------------------

> -------------

      Percentiles      Smallest

1%        68.85           63.5

5%         73.1          68.85

10%        74.75          71.85       Obs      

>           109

25%           80           71.9       Sum of Wgt

> .         109

50%         90.8                      Mean     

>      91.90184

                        Largest       Std. Dev.

>      13.55912

75%          104            115

90%          109          119.6       Variance

>      183.8496

95%          111          119.9       Skewness

>      .1322041

99%        119.9            121       Kurtosis

>      1.892724

.

summ deep_ab_adipose, detail

       deep abdominal adipose tissue in cm sq fr

> om CT

                            scan

------------------------------------------------

> -------------

      Percentiles      Smallest

1%        21.68          11.44

5%        28.32          21.68

10%        32.22          25.72       Obs      

>           109

25%        50.88          25.89       Sum of Wgt

> .         109

50%        96.54                      Mean     

>       101.894

                        Largest       Std. Dev.

>      57.29476

75%          137            229

90%          184            241       Variance

>       3282.69

95%          208            245       Skewness

>      .5767897

99%          245            253       Kurtosis

>      2.672811

.

qnorm ab_adi_res,

   title(normal quantile plot for standardized residuals)

   subtitle(abdominal adipose tissue and waist circumference   n = 109)

swilk ab_adi_res

                   Shapiro-Wilk W test for norma

> l data

    Variable |        Obs       W           V  

>       z       Prob>z

-------------+----------------------------------

> --------------------

ab_adi_res |        109    0.96492      3.113

>     2.531    0.00568

list waist_circ deep_ab_adipose ab_adi_fit se_

> ab_adi_ind if waist_circ > 120

     +--------------------------------+

89. | waist_~c | deep_a~e | ab_adi~t |

     |      121 |      245 | 202.5405 |

     |--------------------------------|

     |            se_ab_~d            |

     |            33.91077            |

     +--------------------------------+

Explanation / Answer

9) R2 is defined as R2 = (SSR/SST)

SSR = Sum of squares for regression ; SST = Sum of squares for total

From the output SSR = 237548.516; SST = SSR+SSE = 237548.516+116981.988 = 354530.504

R2 = (237548.516/354530.504) = 0.6700 = 67%

INTERPRETATION: 67% OF VARIATION IN THE DEPENDENT VARIABLE IS EXPLINED BY THE INDEPENDENT VARIABLE