Borough Case Processing Summary Borough Cases Valid Missing Total N Percent N Pe
ID: 3074649 • Letter: B
Question
Borough
Case Processing Summary
Borough
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
CommuteTime
Manhattan
9
100.0%
0
0.0%
9
100.0%
Other
11
100.0%
0
0.0%
11
100.0%
Descriptives
Borough
Statistic
Std. Error
CommuteTime
Manhattan
Mean
23.33
2.656
95% Confidence Interval for Mean
Lower Bound
17.21
Upper Bound
29.46
5% Trimmed Mean
23.31
Median
23.00
Variance
63.500
Std. Deviation
7.969
Minimum
10
Maximum
37
Range
27
Interquartile Range
11
Skewness
.129
.717
Kurtosis
.421
1.400
Other
Mean
64.91
7.698
95% Confidence Interval for Mean
Lower Bound
47.76
Upper Bound
82.06
5% Trimmed Mean
65.18
Median
68.00
Variance
651.891
Std. Deviation
25.532
Minimum
21
Maximum
104
Range
83
Interquartile Range
43
Skewness
-.320
.661
Kurtosis
-.634
1.279
CommuteTime
Compare students from Manhattan vs. other boroughs to see if they differ in their commute times. Decide which are the most appropriate statistics to describe the boxplots and use these values to describe how do the two groups (Manhattan vs. other boroughs) compare with respect to center, spread and shape? Include the relevant statistics for each one in your explanation. What differences in the distributions of commute times do you notice once the sample was split, and what do you think is the real-world situation behind them?
Case Processing Summary
Borough
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
CommuteTime
Manhattan
9
100.0%
0
0.0%
9
100.0%
Other
11
100.0%
0
0.0%
11
100.0%
120 100 60 40 Other Manhattan BoroughExplanation / Answer
The mean and median commuting times of manhattan students is significantly different from that of other students so we conclude that the difference is obvious
Box plot give us an idea about the three quartiles, First quartile, second quartile(median), third quartile, Interquartile range and the range.
The position of the median decides the shape of the distribution.If median is not dividing the box in to two equal parts, the data is interpreted as skewed. or not symmetric
If median lies more near to the first quartile the data is interpreted as left skewed or negatively skewed ie. most of the scores will be found in the higher side of the scale and very few scores on the lower side .Here in this case mean <= median.
If the median line is near to the third quartile then the shape is right skewed or positively skewed and most of the scores will be found in the lower side of the scale and very few scores on the higher side.Here in this case mean >= median.
Manhattan students mean commuting time is slightly greater than median. But that difference can be neglected since the standard deviation comparitively higher. So commuting time distribution of manhattan students can be considered as an almost symmetric distribution.
Other students commuting time clearly a left skewed distribution. This can be concluded by looking at the magnitude and sign of skewness value in the table. Also mean< median. Trimmed mean ( 5% of extreme observations are removed and mean is calculated for remaining 95% of observation) is greater than mean indicates that observations on the lower scale are more spreaded. This might be the reason for high standard deviation
The interquartile range(IQR) and the range are the measures of spread that can be compared from box plot. The IQR for both the groups differ significantly. Interquartile range is the range between which you can find the middle 50% of the observations in the data. It can be other wise interpreted as more than 50% of manhattan students commuting time falls in the range( 17, 29) minutes (Since First quartile and third quartile are not given i have calculated that roughly with the help of interquartile range and median considering the symmetricity of the distribution).
Since the other commuting time distribution is sqewed median=68 minutes best describes the data than mean and manhattan commuting time distribution is best described by its mean=23.31 minutes. Huge difference can be observed in the commuting times with large standard deviation for other. But this cannot be concluded to a real world situation as the conclusions made so far are based upon small sample sizes and large standard deviation . This means very few observations are widely spreaded. Even the conclusions on the shape may vary if sample size has been increased. But manhattan commuting time can be generalised to themahattan student population on account of comparitively smaller standard deviation.
feel free to comment in case of doubts...
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.