Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Looking at some more detailed codon usage tables, genes may be further clustered

ID: 30713 • Letter: L

Question

Looking at some more detailed codon usage tables, genes may be further clustered into three gene classes: Metabolic genes, highly expressed genes during exponential growth, and horizontal gene transfer. Looking at the original paper by Medique et al., they clustered the genes based on the CAI and then by a variant of k-means determined 3 classes. Note that this is different from a class II gene which is determined by the types of RNA polymerase used.

How did they end up determining what the three classes are? It seems as if they made this generalization without any proteome data. Would the same genes be classified using a protein expression data during exponential growth and stationary growth?

Explanation / Answer

The author starts by stating that as of the time of writing, two different classes of codon usage profiles were known (or at least putatively so). All 782 unique CDS sequences used were subjected to a two-step classification method. In step one, each CDSs was broken down into a 61-dimensional vector representing each of the 61 possible codons. A factorial cluster analysis (the categorical, multi-variate equivalent of principle component analysis) was run on these vectors, condensing 61 dimensions down to 2 dimensions. Now that the data complexity has been reduced to 2D, it is more manageable for a k-means algorithm to partition the data. In the end, the genes were clustered into 3 orthogonal groups (classes I, II and III, with 502, 191 and 89 CDS, respectively).

Only after the authors clustered the gene set were they able to go back and look at the canonical definitions of each gene. It so happened, fortuitously, that each class of the genes had a strong bias for subsets of cellular function (eg, metabolism, protein biosynthesis, transport). They did not use proteome data, but they were able to define the role for a large number of these genes based on the body of literature at the time.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at drjack9650@gmail.com
Chat Now And Get Quote