Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

You have shotgun sequenced four related termite gut samples and obtained a total

ID: 141049 • Letter: Y

Question

You have shotgun sequenced four related termite gut samples and obtained a total of 40 Gbp of sequence data (10 Gbp per sample). You perform a de novo assembly of the data and wish to extract (bin) individual population genomes from the metagenome.

(a) Describe TWO (2) binning approaches that you could apply to the data to obtain population genomes including any known limitations of the approaches. (6 MARKS)

(b) A member of candidate phylum ZB3 is present in the termite gut community at an estimated relative abundance of 0.5%. Assuming that its genome size is 2 Mbp, calculate its expected genome coverage in the metagenomic assembly (show your working). Is this coverage likely to yield a good or poor assembly of the ZB3 population genome (assuming no strain heterogeneity)?

Explanation / Answer

(a)

1. Metacluster.

a two-round binning method (MetaCluster) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species.In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple wvalues are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w.

Limitations-

there is the technical issue of picking the value w. Intuitively, picking a larger w can decrease the number of false positives

while extremely low-abundance species may not have enough reads for binning, low-abundance species seem to have enough reads for binning if we can eliminate the noise from extremely low-abundance species and separate them from high-abundance species.

2. expectation-maximization algorithm after the assembly of metagenomic sequencing reads.

Limitations- faulty collection of data

defective ananlysis of data

(b) 40 Gbp = 40000 Mbp

0.5 % of 40000 = 200 Mbp

No. of genomes = 200/2= 100

Therefore 100 genomes are present per sample

Yes, as it is an ideal standard of observation

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote