Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

e appears 1 time in computer Program Set 4(10 points extra credit) Biologists us

ID: 3705542 • Letter: E

Question

e appears 1 time in computer Program Set 4(10 points extra credit) Biologists use a sequence of letters A, C, T, and G to model a genome. A gene is a substring of a genome that starts after a triplet ATG and ends before a triplet TAG, TAA, or TGA. Furthermore, the length of a gene string is a multiple of 3 and the gene does not contain any of the triplets ATG, TAG, TAA, and TGA. Write a program that prompts the user to enter a genome and displays all genes in the genome. If no gene is found in the input sequence, the program displays no gene is found. Here are the sample runs: RESTART: E:/HW3/HW3_4_genes.py Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT GGGCGT Enter a genome string: TGTGTGTATAT no gene is found Test with 4 more genome strings TGATGCTCTAAGGATGCGCCGTTGATT TGATGCTCTAGAGATGCGCCGTTGAATAT ious

Explanation / Answer

import re

def findGene(genome):
pattern = re.compile(r'ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)')
#ATG matches the characters ATG literally
#(?:TAG|TAA|TGA) Non-capturing group for TAG, TAA, TGA
#((?:[ACTG]{3})+?) capturing group
if not pattern.findall(genome):
print("no gene is Found")
for part in pattern.findall(genome):
print(part)

def main():
genome = input("Enter a genome string: ")
findGene(genome)

main()