PLEASE HELP ME WITH THIS PYTHON CODING IN UNIX THANK YOU I HAVE INCLUDED ALL FIL
ID: 3757848 • Letter: P
Question
PLEASE HELP ME WITH THIS PYTHON CODING IN UNIX THANK YOU I HAVE INCLUDED ALL FILES CONTENT IN THE TASK
TASK: Build a virtual ribosome program (translate.py) that will translate the three positive reading frames of a given DNA sequence. Find the true reading frame and identify the corresponding protein using BLAST:
1.) Part A: load the DNA sequence In order to use a script to translate the DNA sequence into an amino acid sequence, first we need to load the sequence. The DNA sequence is in FASTA format = (dna.fasta). There is more than one way to load the sequence: Parse the fasta file to load the DNA sequence into a string.
2.) Part B: load and store the genetic code In order to translate from DNA to protein, we must know which codons code for which amino acids, and this is best accomplished by saving the information as a dictionary. It is called (universal_genetic_code.tab). Each of the 64 lines in the file looks like: AAA B , i.e. the threeletter codon, a tab ( ), the singleletter amino acid designation, and a newline character ( ). 1) Read this file linebyline. 2) Split each line into a codon string and a oneletter amino acid string. 3) Store this pair in a dictionary, with the codon being the key, and the amino acid being the value. A “*” is used to represent a translated STOP codon
Part C: translating from DNA to protein in 3 frames We have loaded our DNA sequence, and saved all of the genetic code to an accessible file. Now we need to split the DNA into codons and use our dictionary to translate this into amino acids. Use a for loop and the functions range and len to split the DNA into codons.
SEQUENCES (DNA.FASTA):
CTAGGCTAATGCAAATTTTTGTCAAGACTTTGACTGGTAAGACCATCACT
TTGGAAGTTGAATCTTCTGACACTATTGACAATGTCAAGTCAAAGATTCA
AGACAAGGAAGGTATCCCACCTGACCAACAAAGATTGATCTTTGCTGGTA
AGCAATTGGAAGACGGTAGAACCTTGTCTGACTACAACATTCAAAAAGAA
TCCACTTTGCACTTAGTCTTGAGATTGAGAGGTGGTATCATTGAACCATC
TTTGAAAGCTTTGGCTTCCAAGTACAACTGTGACAAATCTGTTTGCCGTA
AGTGTTATGCTAGATTGCCACCAAGAGCTACCAACTGTAGAAAGAGAAAG
TGTGGTCACACCAACCAATTGCGTCCAAAGAAGAAGTTAAAATGACGGAT
TCCGGATCTCGCGCTAG
(universal_genetic_code.tab) FILE CONTAINS:
AAA K
AAC N
AAG K
AAT N
ACA T
ACC T
ACG T
ACT T
AGA R
AGC S
AGG R
AGT S
ATA I
ATC I
ATG M
ATT I
CAA Q
CAC H
CAG Q
CAT H
CCA P
CCC P
CCG P
CCT P
CGA R
CGC R
CGG R
CGT R
CTA L
CTC L
CTG L
CTT L
GAA E
GAC D
GAG E
GAT D
GCA A
GCC A
GCG A
GCT A
GGA G
GGC G
GGG G
GGT G
GTA V
GTC V
GTG V
GTT V
TAA *
TAC Y
TAG *
TAT Y
TCA S
TCC S
TCG S
TCT S
TGA *
TGC C
TGG W
TGT C
TTA L
TTC F
TTG L
TTT F
Explanation / Answer
<syntax type=python>
import string import sys import re
fileentered = True while fileentered == True:
filename = raw_input('Please enter a file to check: ')
if len(filename) >= 1:
try:
seqlist = open(filename, 'r').readlines()
sequence = .join(seqlist)
sequence = sequence.replace(' ', )
totalA = sequence.count('A')
totalC = sequence.count('C')
totalG = sequence.count('G')
totalT = sequence.count('T')
otherletter = re.compile('[BDEFHIJKLMNOPQRSUVXZ]')
extra = re.findall(otherletter, sequence)
output = open(filename+'.count', 'w')
output.write('Count report for file ' + filename + ' ')
output.write('A = ' + str(totalA) + ' ')
output.write('C = ' + str(totalC) + ' ')
output.write('G = ' + str(totalG) + ' ')
output.write('T = ' + str(totalT) + ' ')
if len(extra) > 0:
output.write('Also were found ' + str(len(extra)) + ' errors ')
for i in extra:
output.write(i + ' ')
else:
output.write('No error found')
output.close()
print 'Result file saved on ' + filename + '.count'
except:
print 'File not found. Please try again.'
else:
fileentered = False
sys.exit()
</syntax>
If any doubt please refer this website
http://openwetware.org/wiki/Open_writing_projects/Beginning_Python_for_Bioinformatics
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.