I attached the R-code handout under the questions. (a) For the example in the ha
ID: 3315982 • Letter: I
Question
I attached the R-code handout under the questions.
(a) For the example in the handout with
We see that both the ratio estimator and HT estimator do much better than the usual estimator. Explain.
(b) For the example in the handout with
We see that the usual estimator and the HT estimator behave about the same. Explain.
(c) For the example in the handout and other choices of the design, I found situations where the average absolute error of the HT estimator was 4 times as large as the average absolute error of the ratio estimator. Find a design for which this happens.
(d) The correlation between popy and popx for the example in the handout is 0.92. Construct two new populations, one with correlation around 0.60 and the other with correlation around 0.30, where the ratio estimator and the HT estimator behave similarly when the design is popx but the ratio estimator outperforms the HT estimator for another design.
Handout:
getans<-function(est,stnderr,truetot)
{
err<-abs(est-truetot)
lwbd<-est -1.96*stnderr
upbd<-est + 1.96*stnderr
if(lwbd <= truetot & truetot <= upbd) { cov<-1}
else {cov<-0}
ans<-c(est,err,lwbd,upbd - lwbd,cov)
return(ans)
}
srstot<-function(smp,popy)
{
n<-length(smp)
N<-length(popy)
fpc<-1-n/N
ysmp<-popy[smp]
est<-N*mean(ysmp)
stnderr<-N*sqrt((fpc/n)*var(ysmp))
ans<-getans(est,stnderr,sum(popy))
return(ans)
}
ratiotot<-function(smp,popy,popx)
{
n <- length(smp)
N<-length(popx)
ff<-n/N
ysamp<-popy[smp]
xsamp<-popx[smp]
tx<-sum(popx)
trtot<-sum(popy)
rhat <- sum(ysamp)/sum(xsamp)
est <- rhat * tx
dum1<-(N*N*(1-ff))/(n*(n-1))
vartot <- dum1*sum((ysamp-rhat*xsamp)^2)
stnderr<-sqrt(vartot)
ans<-getans(est,stnderr,sum(popy))
return(ans)
}
httot<-function(smp,popy,designwts)
{
wts<-designwts[smp]
est<-sum(wts*popy[smp])
n<-length(smp)
dum<-sum((n*wts*popy[smp] - est)^2)
stnderr<-sqrt((1/(n*(n-1))*dum))
ans<-getans(est,stnderr,sum(popy))
return(ans)
}
compar3est<-function(popy,popx,design,n)
{
designwts<-sum(design)/(n*design)
smp<-sample(1:length(popy),n,replace=FALSE,prob=design)
anssrs<-srstot(smp,popy)
ansHT<-httot(smp,popy,designwts)
ansratio<-ratiotot(smp,popy,popx)
ans<-rbind(anssrs,ansHT,ansratio)
return(ans)
}
compar3estlp<-function(popy,popx,design,n,R)
{
ans<-matrix(0,3,5)
for(i in 1:R){
ans<-ans + compar3est(popy,popx,design,n)
}
ans<-round(ans/R,digits=3)
return(ans)
}
set.seed(2004)
popx<-rlnorm(500,10,.4)
hist(popx)
popy<-rnorm(500,0.1*popx,3*sqrt(popx))
plot(popx,popy)
sum(popy)
cor(popx,popy)
n<-50
R<-500
design<-popx
compar3estlp(popy,popx,design,n,R)
design<-rep(1,length(popy))
compar3estlp(popy,popx,design,n,R)
[,1] [,2] [,3] [,4] [,5]
anssrs 1439359 207313.61 1268065 342586.8 0.328
ansHT 1232135 24660.52 1168937 126397.3 0.966
ansratio 1239790 22754.39 1176851 125878.2 0.970
[,1] [,2] [,3] [,4] [,5]
anssrs 1234516 63158.28 1083825 301382.6 0.942
ansHT 1234516 63158.28 1075674 317685.1 0.952
ansratio 1232362 24560.78 1172819 119085.1 0.946
Explanation / Answer
S The population total for y is 200680.5. The true variance of the sample mean under SRS
for both y and y + 1000 is 83210579. Under SRS and y and x the true variance of ratio est is
7227162. The simulation results given below support the following given in class.
• Under SRS the sample mean is unbiased but will preform poorly for the other designs.
• Under SRS the sample mean and the HT estimator are the same.
• The HT estimator should work well for y under pps(x) but will preform poorly under pps(1/x).
The ratio estimator works well when y x and its performance does not seem to depend so
strongly on the design although its poorest performance is under pps(1/x).
• Both the ratio estimator and the HT estimator preform poorly in the y + 1000 case even
though the HT is always unbiased.
Results for y and SRS
est Ave val Ave |err| len of CI F of Cov Ave var
ansybar 200556.3 7414.21 35266.93 0.92 82925934
ansratio 200624.7 2116.73 10301.72 0.94 7099095
ansht 200556.3 7414.21 35994.16 0.93 86381181
Results for y and pps(x)
est Ave val Ave |err| len of CI F of Cov Ave var
ansybar 208595.3 10014.78 35767.95 0.85 85155899
ansratio 200762.0 2211.55 10739.43 0.93 7719967
ansht 200608.3 2187.78 10519.53 0.92 7391970
Results for y and pps(1/x)
est Ave val Ave |err| len of CI F of Cov Ave var
ansybar 193157.2 9669.56 34869.41 0.82 81275586
ansratio 200584.1 2197.22 10065.47 0.90 6782293
ansht 201298.6 14247.35 71076.94 0.92 340889715
Results for y+1000 and SRS
est Ave val Ave |err| len of CI F of Cov Ave var
ansybar 701056.0 7229.13 35542.44 0.94 84262331
ansratio 700657.8 17192.85 82968.05 0.94 457356514
ansht 701056.0 7229.13 36275.35 0.94 87773261
Results for y+1000 and pps(x)
est Ave val Ave |err| len of CI F of Cov Ave var
ansybar 707994.6 9629.06 35373.60 0.86 83326439
ansratio 683811.7 21835.73 80019.48 0.86 425625851
ansht 700975.7 17622.00 85981.65 0.93 499091614
Results for y+1000 and pps(1/x)
est Ave val Ave |err| len of CI F of Cov Ave var
ansybar 692901.1 9803.03 34708.05 0.81 80533717
ansratio 721333.6 25035.49 84920.58 0.83 480048344
ansht 700997.9 32070.23 156574.11 0.93 1643293665
b) From the theory we would expected the model based estimate of variance to be smaller under
pps(x) than under pps(1/x). This is true but the difference is quite small. Note that the average
of the model based estimate of variance is much less sensitive to the design than the average of the
usual estimate of variance. This suggests that the usual estimate of variance is biased downwards
for samples with ¯xsmp < x¯ and biased upwards for samples with ¯xsmp < x¯.
Results for popy1 and pps(x) followed by pps(1/x)
est Ave |err| Std ave var Model ave var
200637.58 2143.27 7750240.87 7058512.36
200528.39 2065.72 6721953.59 7070468.28
Results for popy2 and pps(x) followed by pps(1/x)
est Ave |err| Std ave var Model ave var
74207.82 4602.82 35582840.70 32305331.91
73461.98 4664.33 30756811.65 32435700.03
Here is my R code for solving part b.
ratiototboth<-function(smp,popy,popx)
{
n <- length(smp)
N<-length(popx)
ff<-n/N
ysamp<-popy[smp]
xsamp<-popx[smp]
xnsamp<-popx[-smp]
tx<-sum(popx)
trtot<-sum(popy)
rhat <- sum(ysamp)/sum(xsamp)
esttot <- rhat * tx
err<-abs(esttot - trtot)
dum1<-(N*N*(1-ff))/(n*(n-1))
usualvartot <- dum1*sum((ysamp-rhat*xsamp)^2)
usualans<-c(esttot,err,usualvartot)
dum2<-sum(((ysamp -rhat*xsamp)^2/xsamp))*((mean(xnsamp)
*mean(popx))/mean(xsamp))
modelvartot<-dum1*dum2
ans<-c(esttot,err,usualvartot,modelvartot)
return(ans)
}
compare1lp<-function(popy,popx,n,design,R)
{
ans<-rep(0,4)
N<-length(popy)
for(i in 1:R){
smp<-sample(1:N,n,prob=design)
ans<-ans+ ratiototboth(smp,popy,popx)
}
ans<-round(ans/R,digits=2)
return(ans)
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.