Write a Perl script to identify the P-loop motif in the list of Fasta protein se
ID: 651198 • Letter: W
Question
Write a Perl script to identify the P-loop motif in the list of Fasta protein sequences linked below. The P loop motif consists of [A or G]XXXXGK[S or T], where X is any amino acid.
If a match is found you should print the ID of the protein seqence (ie >P12121|Pxxx_HUMAN) and the actual sequence fragment that was matched (ie $&)
Helpful functions:
join(string, list)
$string = join ('glue', @list);
the glue 'joins' the elements together
$string = join(":", @list);
$string now contains 1:2:3:4
split
split (reg exp, $string)
my @f = split/:/, 'a:b:c:d:e';
@f now equals ('a', 'b', 'c', 'd', 'e')
Explanation / Answer
# set optional history queue
my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
-email => 'mymail@foo.bar',
-db => 'protein',
-term => 'Butyrivibrio proteoclasticus[ORGN] AND alaS[Gene/Protein Name]',
-usehistory => 'y');
my $count = $factory->get_count;
# get history from queue
my $hist = $factory->next_History || die 'No history data returned ';
print "History returned ";
# note db carries over from above
$factory->set_parameters(-eutil => 'efetch',
-rettype => 'fasta',
-history => $hist);
my $retry = 0;
my ($retmax, $retstart) = (500,0);
open (my $out, '>', 'seqs.fa') || die "Can't open file:$! ";
RETRIEVE_SEQS:
while ($retstart < $count) {
$factory->set_parameters(-retmax => $retmax,
-retstart => $retstart);
eval{
$factory->get_Response(-cb => sub {my ($data) = @_; print $out $data} );
};
if ($@) {
die "Server error: $@. Try again later " if $retry == 5;
print STDERR "Server error, redo #$retry ";
$retry++ && redo RETRIEVE_SEQS;
}
print "Retrieved $retstart ";
$retstart += $retmax;
}
close $out;
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.