Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Hypergeometric Scientists are trying to figure out whether men or women are more

ID: 3132939 • Letter: H

Question

Hypergeometric Scientists are trying to figure out whether men or women are more likely to have a certain disease, or if they are equally likely. The scientists gather a random sample of m women and a random sample of n men. Assume that the number of women from the sample who have the disease is X, where X ~ Bin(m, p_1) and that the number of men from the sample who have the disease is Y, where X ~ Bin(n, p_2). Suppose it is observed that X + Y = r. Assuming p_1 = p_2 = p, calculate F (X = x | X + Y = r). Why does the p disappear from the solution to part (a)? Think of an intuitive explanation

Explanation / Answer

X denotes the number of women from the sample of size m have the disease.

Y denotes the number of men from the sample of size n have the disease.

X and Y are independent.

so X~Bin(m,p1) and Y~Bin(n,p2)

hence the pmf of X is P[X=x]=mCxp1x(1-p1)m-x    x=0,1,2,3,..,m

and the pmf of Y is P[Y=y]=nCyp2y(1-p2)m-y    y=0,1,2,3,..,n

a) Assuming p1=p2=p

then X+Y~Bin(m+n,p)

then pmf of Z=X+Y is P[Z=z]=m+nCzpz(1-p)m+n-z z=0,1,2,......,m+n

hence

P[X=x|X+Y=r]

=P[X=x,Y=r-x]/P[X+Y=r]=P[X=x]*P[Y=r-x]/P[X+Y=r]   [since X and Y are independent]

=[mCxpx(1-p)m-x]*[nCr-xpr-x(1-p)n-r+x]/m+nCrpr(1-p)m+n-r]

=[mCx*nCr-x/m+nCr]*px+r-x-r*(1-p)m-x+n-r+x-m-n+r

=mCx*nCr-x/m+nCr*1*1

=mCx*nCr-x/m+nCr [answer]

hence X|X+Y follows hypergeometric distribution with parameters (m+n,r,m/(m+n))

b) as seen in part a) that p disappears from the solution

the main reason behind this is that X+Y=r is fixed.

so when X=x then there is no randomness in Y because Y is automatically r-x

so as X+Y=r is fixed, the randomness in Y is gone

and for X , the scenario changes.

then X=x|X+Y=r means out of r diseased men and women what is the probability that x are women.

here we are not finding the number of women with the disease.

here we are finding the number of women out of r men and women.

hence there is no effect of p=probability that a randomly selected women is diseases.

hence from X there is no effect of p and Y's randomness is gone. hence no effect of p from Y's side.

hence p disapprears from the solution