Linkage disequilibrium (LD) occurs when alleles at different loci are nonrandoml
ID: 94354 • Letter: L
Question
Linkage disequilibrium (LD) occurs when alleles at different loci are nonrandomly associated and are more likely to be inherited together. But how do scientists identify LD? For this question, you will calculate D, the coefficient of linkage disequilibrium to test whether a set of SNPs is in linkage equilibrium or disequilibrium. Suppose there are two variable sites (SNPs) on two different loci, and we collect sequence data from a sample of 2000 people. At SNP1, the nucleotide in the locus sequence is either a G or a C, while at SNP2, the nucleotide is either an A or a T. All combinations of SNP1 and SNP2 nucleotides are possible, and from the data we see that there are 474 individuals with GA haplotype, 611 with a GT haplotype, 142 with a CA haplotype, and 773 individuals with CT haplotype.
A) What are the haploid frequencies?
B) Now, let’s calculate D. D = (g11 x g22) - (g12 x g21). Show work below. Are these loci in LD?
g11 = frequency of GA
g12 = frequency of GT
g21 = frequency of CA
g22 = frequency of CT
Explanation / Answer
Genotypic data
GA = 474 GT = 611 CA = 142 CT = 773
Total = 2000
A) Calculation of haplotype and allele frequencies
Haplotype Frequencies
GA = 474 / 2000 = .2370
GT = 611 / 2000 = .3055
CA = 142 / 2000 = .0710
CT = 773 / 2000 = .3865
Allele frequencies
G = 0.542
C = 0.457
A = 0.308
T = 0.692
B) Now, we put the values in the equation for D to calculate linkage disequilibrium -
D = (P11 * P22) - (P12 * P21)
D = (0.2370 x 0.3865) - (0.3055 x 0.0710) = 0.0699
Now, we have to estimate Dmax. For that we have to put allelic frequencies and value for D in the following equation -
Dmax = min [ (p1q2) or (p2q1) ] [As D is positive]
Dmax = (0.5425 x 0.692) = 0.375 or = (0.4575 x 0.308) = 0.141
Now to calculate D’, we have to put value of D and Dmax which has already calculated in previous step, in the following equation -
D’ = D / Dmax
D’ = 0.0699 / 0.141 = 0.496 = 0.5
Now we calculate coefficient of correlation (r), for that we have to put value of D and allele frequencies calculated in previous steps in the following equation
r = D / (p1*p2*q1*q2)1/2
r = 0.0699 /(0.5425 x 0.4575 x 0.308 x 0.692)1/2
r = 0.0699 / 0.23 = 0.304
r2 = (0.304)2 = 0 .092
Now, to check the significance of LD between loci use following equation
?2 = r 2 N
?2 = 0.092 x 2000 = 184.8 (1 df)
At 184.8 and df of 1, P-value is 0.0001
So, we can conclude based on our calculations that there is a significant LD between loci and it is 50% of the theoretical maximum.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.