Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Write a PERL program that generates a random DNA sequence (DNAR) of 400 nucleoti

ID: 3759199 • Letter: W

Question

Write a PERL program that generates a random DNA sequence (DNAR) of 400 nucleotides.

A.) Using BLAST, explore how well DNAR aligns with DNA sequences of known organisms. As

DNAR is random, it would come as no surprise if none of the alignments would be particularly

good and yield a low similarity score.

B.) Next, choose three (3) different DNA sequences (DNA-1,

DNA-2, and DNA-3), each 400 nucleotides long, from different existing organisms and use

BLAST to explore their alignments with known sequences. No surprise, their alignments yield

much higher similarity scores. We would expect their alignments to yield higher

similarity scores even when compared with sequences from other existing organisms such as human, cow, and cat.

WHY??

You task is to explore how DNAR is structurally different from the three sample human, cat, and cow DNAs. For

example, you may conduct a statistical analysis on the occurrence of the four bases A-T-C-G, or

the likelihood of certain patterns. Use your knowledge of PERL to develop methods that will

facilitate your analysis.

Explanation / Answer

working perl code for DNA sequence

#!/usr/bin/perl

use warnings;
use strict;

srand(time|$$);

print "1) Please type the number of iterations (How many random sequences do you want):

EXAMPLE: "10" ";

my $iterations = <STDIN>;
chomp $iterations;
print " 2) Please type the length of the random DNA strings (how many nucleotides length):

EXAMPLE: "50" ";

my $length = <STDIN>;
chomp $length;
print " 3) Please type the probability distribution of A content:

REMEMBER THAT THE SUM OF THE FOUR PROBABILITIES MUST BE EQUAL TO "1.00"

EXAMPLE: "0.25" ";

my $A_content = <STDIN>;

print "
########################################################################
# From a value of "1.00" as total probability, there are: ", (1-($A_content))," available
######################################################################## ";

print " 4) Please type the probability distribution of T content:

REMEMBER THAT THE SUM OF THE FOUR PROBABILITIES MUST BE EQUAL TO "1.00"
  
EXAMPLE: "0.25" ";

my $T_content = <STDIN>;

print "
########################################################################
# From a value of "1.00" as total probability, there are: ", (1-($A_content+$T_content))," available
######################################################################## ";
print " 5) Please type the probability distribution of G content:

REMEMBER THAT THE SUM OF THE FOUR PROBABILITIES MUST BE EQUAL TO "1.00"

EXAMPLE: "0.25" ";

my $G_content = <STDIN>;

print "
########################################################################
# From a value of "1.00" as total probability, there are: ", my $C_content = (1-($A_content+$T_content+$G_content))," available
######################################################################## ";
print " 6) Setting the probability distribution of C content ";
  
print $C_content," ";

#### Ask the user for the name of the fasta header

print " 7) Please, type the name of the fasta header for each sequence (is not necessary to put the >):

EXAMPLE: "random_seq" ";

my $fasta_header_name =<STDIN>;

print " 8) Please, type the name of the output file:

EXAMPLE: "random_sequences_set.fa" ";

my $output_file_name =<STDIN>;
chomp ($A_content,$T_content,$G_content,$C_content,$fasta_header_name,$output_file_name);
my @distribution = ($A_content,$T_content,$G_content,$C_content);

   print "
------------------------------ RESULTS SUMMARY ------------------------------

SUCCESS: Here is the $iterations iterations of $length nucleotides length of
DNA strings in FASTA format with probabilities of:

A = $A_content
T = $T_content
C = $G_content
G = $C_content

EXPORTED TO FHE FILE: "$output_file_name"

----------------------------------------------------------------------------- ";


# Name of the output file

       my $output_file = "$output_file_name";

# Set the file handle "OUTPUT".

       open (OUTPUT_SEQ, ">$output_file");

for(my $k=0;$k<$iterations;$k++){
  
   print OUTPUT_SEQ " >",$fasta_header_name,"_",($k+1)," ";
  
       for(my $i=0;$i<$length;$i++){
  
           print OUTPUT_SEQ distribution(@distribution);
       }
}

exit;

sub distribution{
  
   my @probability = @_;
  
   unless ($probability[0] + $probability[1] + $probability[2] + $probability[3] == 1){
      
       print "Sum of probabilites must be equal to "1.0"! ";
       exit;
               }
   my $randnum = rand(1);

   if($randnum < $probability[0]) {
       return 'A';
   }elsif($randnum < $probability[0] + $probability[1]) {
       return 'T';
   }elsif($randnum < $probability[0] + $probability[1] + $probability[2]) {
       return 'C';
   }else{
       return 'G';
   }

}

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote