Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Data file here: https://expirebox.com/download/e17be4959ea9f5fa963ad623e679f855.

ID: 3178955 • Letter: D

Question

Data file here: https://expirebox.com/download/e17be4959ea9f5fa963ad623e679f855.html

Use the Stata generate command to calculate the values for the Vitamin C loss.

A. What are the null and alternative hypotheses to use for this test?

B. From the summary command for the loss variable, find the standard error of the loss values.

C. Calculate the test statistic, using the average loss and the standard error.

D. Find the degrees of freedom for the test, and find the P value.  (Use SurfStat or Stata’s ttail command for the P value.)

Use the Stata ttest command to carry out the appropriate test using the loss values.

Hint: ttest loss=0

E. Find the test statistic and P value for the investigators’ alternative hypothesis in the output.

Note: When Stata prints a P value as .000 or .0000, or SurfStat says the P value is 0,

                the P value is not literally 0, just very small!

F. Write a summary sentence for a disaster relief agency, answering this question:

Using P < .05 as the criterion, is there evidence that vitamin C is being lost?   

In the Stata test command output, find the 95% confidence interval for the mean loss.

G. Using the point estimate and the confidence interval,

write a sentence for a disaster relief agency answering this question: How much vitamin C is being lost, on average?

USEFUL INFO:

These data come from earlier editions of Introduction to the Practice of Statistics, by David Moore
and George McCabe. Studies conducted for the US Agency for International Development (USAID). Wheat-soy blend (WSB) is prepared for emergency food relief. Vitamin C is added to the WSB when it is prepared in the US and shipped overseas for ready availability for disaster relief. There is a concern that vitamin C is lost in shipment and storage.

   
The researchers selected a simple random sample of 27 bags of freshly-prepared wheat-soy blend at the US preparation site, before shipment to Haiti. They took samples of the WSB, and assessed the Vitamin C content. The bags were specially marked so that they could be resampled 5 months later in Haiti.
   The data consist of 27 pairs of measurements. The key variable for analysis is the difference
between the factory measurement and the Haiti measurement. We'll create the difference in Stata as the first step in our data analysis. Theoretically, the value of Vitamin C in the Haiti sample can't be any larger than the US value. But because the vitamin C isn't completely uniformly mixed in, and there's some measurement error, so some of the differences might be negative.

   
The units on the vitamin C measurements are milligrams of vitamin C per 100 grams of WSB.    
The dataset is wsbhaiti.dta . The two measurements are the variables called factory and haiti.   Create the differences

gen loss = factory - haiti
and get the summary statistics
summ loss, detail  
(You'll see that there are some negative differences; look at the 4 smallest values of loss.) Since the sample size is only 27, we'll check the distribution assumptions carefully:

graph box loss  qnorm loss  
Use Stata to get a 95% confidence interval for the mean vitamin C loss:

ci mean loss
and then repeat these calculations from first principles.

Explanation / Answer

Please note that we can provide solutions using the open source solutions only, so we shall provide the solution using R . The concepts would remain the same,

Hypothesis testing

H0 : There is no difference in the mean values of vitamin c of factory and haiti

H1 : There is a signifcant difference in the mean values of vitamin c of factory and haiti

Please note that this is paired t test as the same bags are resampled at haiti

The complete R snippet is as follows


library(foreign)
data.df<- read.dta("C:\Users\586645\Downloads\Chegg\wsbhaiti.dta")

str(data.df)

summary(data.df)

# perform the t test
t.test(data.df$factory,data.df$haiti,paired = TRUE)

The results are as follows

> summary(data.df)
factory haiti
Min. :32.00 Min. :34.00
1st Qu.:39.00 1st Qu.:35.00
Median :43.00 Median :38.00
Mean :42.85 Mean :37.52
3rd Qu.:46.00 3rd Qu.:39.50
Max. :52.00 Max. :43.00

> t.test(data.df$factory,data.df$haiti,paired = TRUE)

   Paired t-test

data: data.df$factory and data.df$haiti
t = 4.9589, df = 26, p-value = 3.745e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.122616 7.544050
sample estimates:
mean of the differences
5.333333

Th tstat , p value and df are highlighted in black. as the p value is less than o=0.05 , hence we reject the null hypothesis in favor of alternate hypothesis and conclude that There is a signifcant difference in the mean values of vitamin c of factory and haiti

Please note that we can answer only 4 subparts of a question at a time , as per the answering guidelines. Answers are provided till d part.