1. The genome of a newly discovered bacterial species ( Bacillus sanfranciscus )
ID: 165954 • Letter: 1
Question
1. The genome of a newly discovered bacterial species (Bacillus sanfranciscus) was sequenced and found to have a circular genome of 4 x 106 base pairs (bp). Open reading frame (ORF) analysis indicated the presence of 3,190 ORFs that encode proteins with a median length of 270 amino acids (aa) and an average length of 360 aa.
a) What us the information content of this genome (i.e. – how much information can be encoded in this length of DNA)? Since the genetic code is digital in nature, let’s convert the information content of base pairs into bytes and compare this value with the information content of a digital device many of us routinely use. The iPhone operating system, iOS8, requires approximately 5 GB of information to perform its functions. To compare the information content of B. sanfranciscus and the iPhone OS, the following assumptions about the digital content of DNA-encoded information might be helpful. The double helix can potentially encode information in both strands but this is not usually the case; most stretches of DNA encode information in only one strand, although for any given gene it can be either of the two strands. So it is therefore reasonable to assume that each base pair of DNA encodes 2 bits of information (since there are 4 possible nucleotides). Keeping in mind that 1 byte = 8 bits and 1 GB = 109 bytes, the calculation is pretty straightforward from there. Express your answer as iPhone iOS units.
b) Now calculate the percentage of the bacterial genome that encodes the cell’s complete proteome. Assume that: a) all of the predicted ORFs actually encode proteins, b) each gene is encoded by only one of the two strands of DNA, and c) there are no overlapping genes (i.e. - no region of DNA encodes more than a single gene).
Explanation / Answer
A. Each base pair encodes 2 bits of information.
Therefore, a genome of 4 x 106 base pairs will have (2 x 4 x 106) = 8 x 106 bits = 1 x 106 bytes of information.
In iPhone iOS units, the DNA encodes (1 x 106 / 5 x 109) = 2 x 10-4 iOS8 units of information.
B. The average length of a protein coded is 360.
Number of nucleotides required to code for a 360 aa protein (including the stop codon) = 1 + 360 x 3 = 1081 nucleotides
Number of nucleotides coding for proteins = 1081 x 3190 = 3448390 nucleotides
Percentage of genome coding for proteins = 3448390/(2 x 4 x 106) x 100 = 43.10%
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.