Problem 3. The deoxyribonucleic acid (DNA) is a molecule that contains the genet
ID: 3722164 • Letter: P
Question
Problem 3. The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development and functioning of all known living organisms. The basic double-helix structure of the DNA was co-discovered by Prof. Franci ick, a long-time faculty member at UCSD. The DNA molecule consists of a long sequence of four nucleotide bases: adenine (A), cytosine (C), gua- nine (G) and thymine (T). Since this molecule contains all the genetic information of a living organism, geneticists are interested in understanding the roles of the variuos DNA sequence patterns that are con- inuously being discovered worldwide. One of the most common methods to identify the role of a DNA sequence is to compare it with other DNA sequences, whose functionality is already known. The more similar such DNA sequences are, the more likely it is that they will function similarly. Your task is to write a C program, called dna.c, that reads three DNA sequences from a file called dna-input.dat and prints the results of a comparison between each pair of sequences to the file dna output.dat. The input file dna input.dat consists of three lines. Each line is a single se- quence of characters from the set (A, C., G. T). that appear without spaces in some order, terminated by the end of line character In. You can assume that the three lines contain the same number of characters MacBook AirExplanation / Answer
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
#define MAX_IN_LENGTH 241
#define OUT_LENGTH 60
int n_dna = 0;
FILE *ofile;
int read_DNA(char sequence[n_dna][MAX_IN_LENGTH])
{
FILE *file = fopen("dna_input.dat", "r");
if(file==NULL){
printf("Input file does not exists ");
exit(1);
}
int i = 0;
while (fgets(sequence[i++], MAX_IN_LENGTH, file) != NULL)
{
n_dna++;
}
//sequence[i][strlen(sequence[i])-1] = ' ';
return strlen(sequence[0]);
}
double compare_DNA(char seq1[], char seq2[], char seq3[], int n)
{
int i = 0, c = 0;
while (seq1[i] != ' ')
{
if (seq1[i] == seq2[i])
{
seq3[i] = seq1[i];
c++;
}
else
seq3[i] = ' ';
i++;
}
seq3[i] = ' ';
return (double)c / (n - 1);
}
void print_DNA(char seq1[], char seq2[], char seq3[], int n)
{
int l;
double rows = ceil((float)n / OUT_LENGTH);
for (l = 0; l < (int)rows; l++)
{
int k = l * OUT_LENGTH;
while (k < (l + 1) * OUT_LENGTH && k < n - 1)
{
fprintf(ofile, "%c", seq1[k]);
k++;
}
fprintf(ofile, " ");
k = l * OUT_LENGTH;
while (k < (l + 1) * OUT_LENGTH && k < n - 1)
{
fprintf(ofile, "%c", seq3[k]);
k++;
}
fprintf(ofile, " ");
k = l * OUT_LENGTH;
while (k < (l + 1) * OUT_LENGTH && k < n - 1)
{
fprintf(ofile, "%c", seq2[k]);
k++;
}
fprintf(ofile, " ");
}
}
int main()
{
ofile = fopen("dna_output.dat", "w");
char sequence[10][MAX_IN_LENGTH];
int n = read_DNA(sequence), i, j;
for (i = 0; i < n_dna; i++)
{
for (j = i + 1; j < n_dna; j++)
{
char seq[MAX_IN_LENGTH];
int seq_len = strlen(sequence[i]), l;
double overlap = compare_DNA(sequence[i], sequence[j], seq, seq_len);
fprintf(ofile, "Comparision between sequence #%d and #sequence #%d: ", i + 1, j + 1);
print_DNA(sequence[i], sequence[j], seq, seq_len);
fprintf(ofile, "The overlap percentage is: %.2lf% ", overlap * 100);
}
}
printf("Successfully write to output file ");
//printf("%s%s%s ", sequence[0], sequence[1], sequence[2]);
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.