Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(Bioinformatics: find genes) Biologists use a sequence of the letters A, C, T, a

ID: 672398 • Letter: #

Question

(Bioinformatics: find genes) Biologists use a sequence of the letters A, C, T, and G to model a genome. A gene is a substring of a genome that starts after a triplet ATG and ends before a triplet TAG, TAA, or TGA. Furthermore, the length of a gene string is a multiple of 3, and the gene does not contain any of the triplets ATG, TAG, TAA. or TGA. Write a program (In C++) that prompts the user to enter a genome and displays all genes in the genome. If no gene is found in the input sequence, displays no gene.

Here are the sample runs:

Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT

TTT

GGGCGT

Explanation / Answer

import java.io.PrintStream;
import java.util.Scanner;
public class Exercise9_35

{

public static void main(String[] args)

{
Scanner input = new Scanner(System.in);
System.out.print("Enter a genome string: ");
String genome = input.nextLine();
boolean found = false;
int start = -1;
for (int i = 0; i < genome.length() - 2; i++) {
String triplet = genome.substring(i, i + 3);
if (triplet.equals("ATG")) {
start = i + 3;
} else if (((triplet.equals("TAG")) || (triplet.equals("TAA")) || (triplet.equals("TGA"))) &&
(start != -1))
{
String gene = genome.substring(start, i);
if (gene.length() % 3 == 0)
{
found = true;
System.out.println(gene);
start = -1;
}
}
}
if (!found)
System.out.println("no gene is found");
}
}