The genome of a newly discovered bacterial species (Bacillus sanfranciscus) was
ID: 167353 • Letter: T
Question
The genome of a newly discovered bacterial species (Bacillus sanfranciscus) was sequenced and found to have a circular genome of 4 times 10^6 base pairs (bp). Open reading frame (ORF) analysis indicated the presence of 3, 190 ORFs that encode proteins with a median length of 270 amino acids (aa) and an average length of 360 aa. a. What us the information content of this genome (i.e. - how much information can be encoded in this length of DNA)? Since the genetic code is digital in nature, let's convert the information content of base pairs into bytes and compare this value with the information content of a digital device many of us routinely use. The iPhone operating system. iOS8, requires approximately 5 GB of information to perform its functions. To compare the information content of B. sanfranciscus and the iPhone OS, the following assumptions about the digital content of DNA-encoded information might be helpful. The double helix can potentially encode information in both strands but this is not usually the case; most stretches of DNA encode information in only one strand, although for any given gene it can be either of the two strands. So it is therefore reasonable to assume that each base pair of DNA encodes 2 bits of information (since there are 4 possible nucleotides). Keeping in mind that 1 byte = 8 bits and 1 GB = 10^9 bytes, the calculation is pretty straightforward from there. Express your answer as iPhone iOS units. b. Now calculate the percentage of the bacterial genome that encodes the cell's complete proteome. Assume that: a) all of the predicted ORFs actually encode proteins, b) each gene is encoded by only one of the two strands of DNA, and c) there are no overlapping genes (i.e. - no region of DNA encodes more than a single gene).Explanation / Answer
(a) Comparing the genome to computer data storage
In order to represent a DNA sequence on a computer, we need to be able to represent all 4 base pair possiblities in a binary format (0 and 1). These o and 1 bits are usually grouped together to form a larger unit, with the smallest being a "byte" that represents 8 bits. We can denote each base pair using a minimium of 2 bit, which yields 4 different bit combinations (00,01,10, and 11). Each 2-bit combination would rpresent one DNA base pair. A single byte (or 8bits) can represent 4 DNA base pairs. In order to represent the entire diploid human genome in terms od bytes, we can perform the following calculations.
6*10^9, base pairs/diploid genome * 1 byte/ 4 base pairs= 15 * 10^9 bytes or 1.5 gigabytes, about 2 CDs worth of space! or small enough to fit 3 seperate genomes on a standard DVD!
Data storage across the whole organism
for simplicity sake, let's ignor the microbiome (all non human cells that live in the body), and focus only on the cells that make up our body. Estimates for the number of cells in the human body range between 10 trillion and 100 tillion. Let us take 100 trillion cells as the generally acepted estimate. So, given that each diploid cell contains 1.5 GB f data (this is very approximate, as i am only accounting for the diploid cells and ignoring the haploid sperm and egg cells in our body), the approximate amount of data stored in the human body is
1.2 Gbytes * 100 trillion cells =150 trillion Gbytes or 150*10^12*10^9 bytes= 150 zettabytes (10^21).
(b)
As the number of sequenced organisms expands, no one person can have a working acquaintance with every sequenced genus. As a result, the quality and richness of the metadata take on greater importance. Many sequenced samples were never characterized phenotypically, physiologically, or metabolically and the sampling details may be buried in the literature.
Even in the absence of antigen stimulation, a human can probably make more than 1012 different antibody molecules its preimmune antibody repertoire. Moreover, the antigen-binding sites of many antibodies can cross-react with a variety of related but different antigenic determinants, making the antibody defense force even more formidable. The preimmune repertoire is apparently large enough to ensure that there will be an antigen-binding site to fit almost any potential antigenic determinant, albeit with low affinity. After repeated stimulation by antigen, B cells can make antibodies that bind their antigen with much higher affinity a process called affinity maturation. Thus, antigen stimulation greatly increases the antibody arsenal.
Antibodies are proteins, and proteins are encoded by genes. Antibody diversity therefore poses a special genetic problem: how can an animal make more antibodies than there are genes in its genome? (The human genome, for example, contains fewer than 50,000 genes.) This problem is not quite as formidable as it might first appear. Recall that the variable regions of both the light and heavy chains of antibodies usually form the antigen-binding site. Thus, an animal with 1000 genes encoding light chains and 1000 genes encoding heavy chains could, in principle, combine their products in 1000 × 1000 different ways to make 106 different antigen-binding sites (although, in reality, not every light chain can combine with every heavy chain to make an antigen-binding site). Nonetheless, the mammalian immune system has evolved unique genetic mechanisms that enable it to generate an almost unlimited number of different light and heavy chains in a remarkably economical way, by joining separate gene segments together before they are transcribed. Birds and fish use very different strategies for diversifying antibodies, and even sheep and rabbits use somewhat different strategies from mice and humans. We shall confine our discussion to the mechanisms used by mice and humans.
We begin this section by discussing the mechanisms that B cells use to produce antibodies with an enormous diversity of antigen-binding sites. We then consider how a B cell can alter the tail region of the antibody it makes, while keeping the antigen-binding site unchanged. This ability allows the B cell to switch from making membrane-bound antibody to making secreted antibody, or from making one class of antibody to making another, all without changing the antigen-specificity of the antibody.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.