Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

PERL SCRIPT PROGRAMMING A researcher has a file containing information about the

ID: 3864765 • Letter: P

Question

PERL SCRIPT PROGRAMMING

A researcher has a file containing information about the number of times particular k-mers (peptide sequences of length k, derived from actual protein sequences) occur in the human proteome. The information for each k-mer is on one line in the file. The information is divided into columns. The first column is the position of the start of the k-mer in its source protein. The next column is the k-mer itself. Then are two counts: the number of times that the k-mer occurs in the human proteome, and the number of proteins in the human proteome which contain the k-mer. The information columns are deliminted by tab characters. For example, a portion of the data file might look like:

The researcher is interested in those k-mers for which the counts in the last two columns are both 0; i.e. the researcher is interested in k-mers which do not occur in the human proteome. For instance, given the data above, the researcher would be interested in being informed of the k-mer IDTLQ.

Write a Perl script that will output, on the standard output, the k-mers that do not occur in the human proteome assuming input as described above. Each k-mer is to be on a separate line. The script is to read from standard input. Assume that the input file contains nothing other than lines of k-mer information.

Hint: Use the pattern-extraction facilities of Perl.

Your scripts should be independent of the value of k (providing, of course, that k1). That is, your scripts should be work for data files of k-mers of any size. Further, k should not be a parameter in/to your scripts

note: perl script not shell script

Explanation / Answer

to run :

perl dna.pl

perl dna.pl inputfile.txt

if filename is not passed, inputfile.txt will be read from pgm by default

CODE

--------

OUTPUT