Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Hi, just a quick Java programming question. http://courses.cs.washington.edu/cou

ID: 3673885 • Letter: H

Question

Hi, just a quick Java programming question. http://courses.cs.washington.edu/courses/cse142/16wi/homework/7/spec.pdf please read the spec. I don't know how to convert the input file that we're given (for example, dna.txt) to a random output file (using a PrintStream?). Can someone just take me through the steps of coding the "Region Name:", "Nucleotides:", and the "Codons List:" part? I just don't get how to get a string from the input file and write it into a new output file. The rest I understand. No overcomplicated programming things please, I only have a basic understanding of computer science (using JGRASP).

Explanation / Answer


import java.io.*;
import java.util.*;
import java.util.Scanner;

public class ConvertDna {
    // % of characters to randomly make lowercase
    public static final int PERCENT_LOWERCASE = 10;

    // % of lines to have as proteins (the rest are random DNA)
    public static final int PERCENT_PROTEINS = 66;


    public static void main(String[] args) throws IOException {
        Scanner console = new Scanner(System.in);

        System.out.print("Genome (.fna) file? ");
        Scanner fnaInput = new Scanner(new File(console.nextLine()));
        System.out.println("Reading .fna file...");
        StringBuilder sb = new StringBuilder(4000000);
        while (fnaInput.hasNextLine()) {
            sb.append(fnaInput.nextLine());
        }

        System.out.print("Protein (.ptt) file? ");
        Scanner pttInput = new Scanner(new File(console.nextLine()));

        System.out.print("Output file? ");
        PrintStream out = new PrintStream(new File(console.nextLine()));

        System.out.print("How many proteins (-1 for all)? ");
        int proteins = console.nextInt();
        System.out.println("Producing protein output...");

        readProtein(pttInput, proteins, sb, out);
    }

    public static void readProtein(Scanner pttInput, int proteins,
            StringBuilder sb, PrintStream out) {
        pttInput.nextLine();   // skip header lines
        pttInput.nextLine();
        pttInput.nextLine();

        Random rand = new Random(42);
        while (proteins != 0 && pttInput.hasNextLine()) {
            String line = pttInput.nextLine();
            Scanner lineScan = new Scanner(line);
            lineScan.useDelimiter("[ :.]+");
            int start = lineScan.nextInt();
            int end = lineScan.nextInt();
            String strand = lineScan.next(); // "+" or "-"

            if (strand.equals("+")) {
                lineScan.next(); // skip length token
                lineScan.next(); // skip PID token
                lineScan.next(); // skip gene token
                lineScan.next(); // skip synonym token
                lineScan.next(); // skip code token
                lineScan.next(); // skip COG token
                String name = lineScan.next();
                while (lineScan.hasNext()) {
                    name += " " + lineScan.next();
                }

                if (rand.nextInt(100) > PERCENT_PROTEINS) {
                    // grab some random dna
                    start = rand.nextInt(sb.length() - 3) + 1;
                    end = Math.min(start + 3 + 3 * rand.nextInt(100), sb.length()) - 1;
                    name = "Non-protein region";
                }
                int length = end - start + 1;

                out.println(name);

                StringBuilder range = new StringBuilder(sb.substring(start - 1, end));

                // pseudo-randomly change casing of 10% of nucleotides
                for (int i = 0; i < length * PERCENT_LOWERCASE / 100; i++) {
                    int index = rand.nextInt(length);
                    range.setCharAt(index, Character.toLowerCase(range.charAt(index)));
                }
                out.println(range);
                proteins--;
            }
        }
    }
}

dna.txt

cure for cancer protein
ATGCCACTATGGTAG
captain picard hair growth protein
ATgCCAACATGgATGCCcGATAtGGATTgA
bogus protein
CCATt-AATgATCa-CAGTt

ecoli.txt
thr operon leader peptide
ATGAAACGCATTAGCaCCAcCATtACCACCaCCATCaCcATTACCACAGGTAACGGTGCGGGCTGA
aspartokinase I/homoserine dehydrogenase I
ATGCGAGtGTTGAAGTTcgGCGGTaCATCAgTGGCAAATGCAGAACGTtTTCTGCGGgTTGCCGATAttCTGGAAAGcAATGCCAGGCAGGGGCAGgTGGcCACCGTCCTCtCTGcCCCCGCCAAAATCACCAACCATCtGGTaGCGATGATtGaaAAaACCATtAGCGGTCAGGAtGCtTTaCcCaATATCAGCGATGCCGAACGTATTTTTGCCGAACTtCTGACgGGACTCGCCGCcGCCCAGcCGGGATTTCCGCTGGCACAAtTgAAAAcTTTCGTCGACCAgGAATTTGCCCAAATAAAACATGTcCtGCATGGCatCAGTTTGTTGGGGCAGTGCCCGGaTAGCATcAACGCTGCGCTGATTTGcCGTGgCGAGAAAaTGTcGaTcgCCattaTGGCCGGCGTGTTAGAAGCGCGTGGTCACAACGTTACCGTTATCGATCCGgTCGAAaAAcTGCTgGCAGTGGGTCATTAcCtCgAaTCTACCGTTGATaTtGCTGAATCCACCCGCCGTATTGCGGCAAGCCGCATTCCgGCTGACCACATgGtGCTGATGGCTGGTTTCACTGcCggTAATGAAAAAGgCGaGCTGGtGGTtCTGGGAcGCAACGGTTCCGACTaCTCCGCTGCGGTgCTGGCGGCcTGTTTaCGCGCCGATTGTTGcGAgaTCTGGACGGATGTTGAcGGTGTTTATACCTGCGATCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTATCAGgAaGCGATGGAGCTTTCTTACTTCGGCGCTAAAgTTCTTCaCCCcCGCACCATTACCCCCATcGCCCAGtTCCAGATcCCTtgCCtGATTAAAAATAcCGgAAAtCCCCAAGCACCAGgTACGCtCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATcTGAATaACATGGCAATgTTCAGcGTTTCCGgCCCGGGGAtGAAAGGgATggTTgGCATGGCGGCGCGcgTCTTTGCAGcGaTGTCACGCGCCCGTaTTtCCGTGGTgCtGATTACGCAATCATCTTCCGAATACAGTATCAGTTTCTGCGTTCCGCaAAGCGACTGTGTGCGAGCTgAaCGGGCAaTGcAGGAAGAGtTCTACCTGGAaCTGaAAGAAGGCTTACTGGAGCcGTTGGCgGtGACGGAACGGCTGGCCATTATCTcGGTGgTAGGTGATGGTATGCGcACCTtaCGTGGGAtCTCGgCGAAATtCTtTGCCGCGCTgGCcCGCGCCAATATCAACATTGTCgCCATTGCtCaGGGaTCTTcTGAaCGCTCAAtCTCTGTcGTGGTcAaTAACGATgATGCGACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTGgCGTCGGTGGCGTTGgcGGTGCGCTGCTGgAGCAACTGAAGCGTCAgCAAAGCTGGTTGAAGAATAAaCATATCGaCTTACGTGTCTGCGGTGTTGCTAACTCGAAGgCACtgCTCACCAATGTACATGGCCTTAATCTGGAAAACTGGCAGgAAGAACTGGCGCAAGCcAAAGAGCCGTTTAATCTCGgGCGcTtAATTCGCCTCGTGAAAGAATATCATCTGCtGAaCCCGGTCATTgTTGACTgTACTTCCAgCCAGGCTGTgGCAGaTCAATATgCCGACTtCCTgCGCGAAGGTTTCCAcGTTGTtACGCCGAaCAAAaAGGCCaACACCTCGTcgATGGaTTACTaCCATCAGTtGCGTTATGCGGCGGAAAAATCGCGGCGTAaATTCCTCtATGACACcaACGTtGGGGCTGGATTACCGGTTATTgAGAACCTGCAAAATCTGCTCAATGCtGGTGATGAATTGATGAAGTTCTCCGGCATTCTTTCAGGTTCGCTTTCTTAtATCTTCGGCAAGTTAGACGAAGGCaTGAGTtTCTCCGAGgCGACCaCACTGGCGCGGGAAATGgGTTATACCGAACCGGAcCcGCGAGATGATCTTtCtGGTATGgAtGTGGCGCgTAagCTAtTGATtCTCGCTCGTGAAACGGGACGTGAACTGGAGCtGGCGGATATTGAAATTGAACCTgTGCTGCCCGCaGaGTTTAACGCCGAGGGTGATGTCGCcGCTTTTATGGCGAATCTGTCACAGCTCGACGaTCtCTTTGCCGCGCGTGTgGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATAttGATGAAGATgGCgTCTGCCGCGTGAAGaTTGCCGAAGTGGATGgTAATGaTCCGCTGTTCAAAGTGAaAaATGGCGaAAACGCCCTGGCCTTCTATAGCCACTATtATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGCaATgACGTTaCAGCTGCCGGTgTCTTTGCTGATCTGCTACGtACCCTcTCAtGGaAGTTAGGAGTCTGA
homoserine kinase
ATgGTTAAAgTTTAtGCCCCGGCtTCCAGTGCCaATATGaGcGTCGgGTTTGATGTGCTCGGGgCGGCGGTGACACCTGTTGATGGTGCATTGCTCGgAGaTGTagTcaCGGTTGAGGCGGCAGAGACaTTCAgTCTCAACAACCTCGGACGCTTTGCCGAtAAGCTGCCGTCAGAGCCACGgGaaAATAtCGTTtATcAGTGcTGGGAGCGTtTTTGcCaGGAGCTTGGCAAGCAAATTCCAGTGGCGATGaCTCTGGAAAAGAATatGCCGAtCgGTTCGGGcTTAGGCTcCAGCGCCtGTTCAGTGGTCGCGGCgCTgAtGGCGATgAATGAAcACTGCGGCaAGCCGCTTAATGACACTCGTTTGCTGGCTTtGATGGgCGAgTTGGAAGGGcGTATCTCCGGCAGCAtTCATTACGACAACGtGGCACCGTGtTtTCtTGGTGGTAtGCAGTtgATGATCGAAGAaAACGACATCATCAGCCAGCAaGTGCCAGGGTTTGATGAGtGGCTGTGGGTGCTGGCGTATcCGGgGAtTAAAGTCtCGaCGGcAGAAGCCAGGGCTaTTTTACCGGCGCAGTATCGCCGCCAGGATTGCATTGCGCAcGGGCgACATCTgGCAGGCTTCATTCACGCCTGCTATTCCCGTCAGCTTGAGCTTGCCGCGAAGCTGATgAAAGaTGTTATCGCTGAACCCTACcGTGaACgGTTaCTGCCAGGCTTCCGGCAGGCGCGGcAGgCGGTTGCGGAAATCGGCGCGGTAgCGAGCGGTATCTCCGGCTCCGGCCCGAcTtTGTTCGCTCTGTGtGAcAAGCCGGATACCGCCCAGCGCGTTGCCGACTGgTTGGGTAAGAACtAcCTGCAAAATCAGgAAGGTTTTGTTcATATTTGCCGGCTGGATACGGCGGGcGCACGAgTACTGGAAAACTAA
threonine synthase
ATGAAACTCtacaATCTGAAAGATCACAATGAGCAGgTCaGCTTTGCGCAAGCCGTAACCCAGgGgTTAGGCAAAAATCAGGGgCtGTtTTTTCcgCACgaCCTGCCGGaaTTCAGCcTgACTGAAaTTGATGAGATgCTGAAGCtGGATTTTGTCACcCGCAGTGCGAAGATCCTcTCgGCGTTTATTGGTGATGAAATCCCGCAGGAAaTCCTGGAAGAGCGCGTACGTGCGGCGTTTGCCTTCCCGGCTCCGGTCGCCAATGTTGAAaGCGATGTCGGTtGTCTGGAaTTGTTCcACGGGCcAACGCTGGCaTTTAAAGATTTCGGcGGTcGCTTTATGGCACAAATGCTgACCcATATTGCGGGCGATAAGCCAGTGAcCATTCTGACCGCGACATCCGGTgATACTGGaGCGGCAGTGGcTCATGcTTTCtACGGTtTACCGAATGTGAAAGTGGTTATCCTCTATCCACGAGGCAAAATCAGTCCACTGCAAGAAAAACTgTTCTGTACATTGgGCggCAATATCGaAACTGTTGCCATCGAcggCGaTTTCGATGCCTGTCAGGCGCTGGTgAAGCAGGCgTTTGATGATGAAGAACTGAAAGTGgCgCtGGGGCtGAATTCTGCTAAcTCCATCAACaTCAGTCGCTTGCTGGCGcAGATTTGTTaTTAcTTTGaGGCTGTCGCACAGTtGCCGCAAGAAGCACGTAACCAGTTGgTTGTCTCGGTaCCGAGTGgAAACtTcGGCGATtTGACGGcGGGTCTGCTGGCGAaGTcACTCGGTCtGCCGGTAAAACGTtTTATTGCtgCGACCAACGTGAACGAtACCGTACCACGTTTCCTGCaCGaCGGTCAGTGGTCAcCCAAaGCGACTCAGgCGAcgTtaTCCAATGCGATGGATGTTAGCCAGCcAAaCAACTGGCCGCGTGTGGAAGAGTTGtTCcGCCGCAAAATCTGGCAACTGAAAGAGCTGGgTTATGCAGCCGTGgATGATGAAACCACGCAACAGACAATGcGTGAGtTAAaAGAACTGGGCTATACCTCGgAGCCGCACgCTGCCGTAGCTTATCGTGCGCTGCGTGACCAgTTGAAtCCAGGCGAATATGGCTTGTtCCTCGGcACcGCGCATCcGGcGAAatTtAAAgAGAGCGTGGAAGCGATTCTCGGTGAAAcGTTGGatCTGCCAAAAGAGCTGGCAGAACGTGCTgATTTACCCTTGCTTTCGCATAACCTGCCCGCCGATTTTGCTGCGTTGCGTAAatTgaTGATGAaTCATCAGTAA
hypothetical protein
AtGCAGCCcGGCTtTTTTTATGAAGAAAATaTGGAGaAaAACGACagGGAAAAAGGAGAAATTCtCAATAAATGCGGtAACTTAGAgATTaGGATTGCGGAGAATaACAACTGCcGTTCTCaTCGCGTAATCTCCGGATATCGACCCaTAACGGgCAATGATAAAAGgAGTAACCTGTGA
Non-protein region
aAAAACTgCTGGAAACAATGAAAGAcGTACCGGACGACCAAcGTCAGgCGC
transaldolase B
ATGACGGACAAATTGaCCTCcCTTCGTCAGTACACCACCGTAgTGGCCGACACTGGGGACATCGCGGCAATGAAGcTGTaTCAACcGCAGGATGCCACAACCAAcCCTtCTCTCATTCTTAACGCAGCGCAGATTCcGGAATACCGTAAgTTgATTGaTGATGCTGTCGCCTGGGcGAaACaGCAGAGCAAcGATcGCgCgCAGCAgATCGtGGACGCGACCGAcAAACTGGCAGTAaATATTgGTCTgGAAaTCCTGAAACTGgTTCCGgGCCgTATCTCAActGAAGTtGATGCGCGTCTTTCCTATGACaCCGAAGCGTCAATTGCGAAAGCAAAACGCCTGATCAAACTCTACAACGATGcAGGTaTTAGCAACGATCgTaTTCTGATCAAACTGGCTTCTACCTGGCAGGGTATCCGTGCTGcAGAACAGCTGGAAAAAGAaGGTATTAACTGTAAcCTGACCCTGCTgtTCTCctTCGCtCAGGcTCGTGCTTGTGCGGaAGCGGgCGTgTTCCTGaTCTCGcCGTTTgTTGGCcGTATTCTTGACTGGTAcAAaGCGAATACCGaTAAGAAAGAGtACGCTCcGGCAGAAGATcCGGGCGTGGTTTCTGTatCtGAAATCtACCAGtACTACaAAGAGCATGGTTaTgAAACCGTGGTTATGGGCGCAAGCTTCCGTAACATCGGCGAAATTCTGGAAcTGGCAGGCTGCGACCGTCTGACCatCGCACCGgcACTGCTGAAAGAGCTGgCGGAGAGCGAAGGGGCTATCgAACGTAAACTgTCTTACAcTGgTGAAGTgAAAGCgCGTCCGGcGCGTATCACtGAGtCCGAGTTCCTgTGgCAgCACAACCAGGATCCAATGGCAGTaGATAAACTgGcGGaAGgTATCCGTAAGTTTGCTGTTGACCAGGAAAAACTGGAAAAAATGATCGGCGATCTGCtGTAA
molybdopterin biosynthesis mog protein
ATGAATACTTTACGTATTGGCTTaGTtTcCaTCTCTGATCGCGCATCCAGCGGCGTTTAtCAGgaTAAAgGCATCCCTGCGCTGGAagAATGGCTGACAtcGGCGCTAACCACGcCGTTTGAaCTGGAAAcCCgCTTaATCCCCGATGAGCAGGCGATCATCGAGCAaACgTTgTGTGAGCTGGTGGATGAAaTGAGtTGCCaTCTGGTGCTCACCACGGGCGGAAcTGGCCCTGCGCGTCGTGAcgTAACGCcCGATGcGACGCTGGCAGTAGCGGACCGCGAGATgCcAGGCTTTGGTGAACAGATGCGCCAGATCAGCCTGCATTTTGTACcaaCTGCGATCCTTTCGCGTCAGGTggGGGTgATTCGCAAACAGGCGCTGATCCTTAACTTaCcCGGTCAACCGAAGtCTATTAAAGAGACGCtGgAAGGTGtGAAGGACGCTGAGgGTAAcGTTGTGGTGCACGgTATTTTTGCCaGCGTaCcGTaCTGCATTCAGTTGCTGGAAGGGCCATACGTTGAaACGGCaCCgGaAGTGGTTGCAGCATTCAGaCCGAAGAGTGCAaGACGCGAAGtTAGCGAATAA

output_dna.txt

Region Name: cure for cancer protein
Nucleotides: ATGCCACTATGGTAG
Nuc. Counts: [4, 3, 4, 4]
Total Mass%: [27.3, 16.8, 30.6, 25.3] of 1978.8
Codons List: [ATG, CCA, CTA, TGG, TAG]
Is Protein?: YES

Region Name: captain picard hair growth protein
Nucleotides: ATGCCAACATGGATGCCCGATATGGATTGA
Nuc. Counts: [9, 6, 8, 7]
Total Mass%: [30.7, 16.8, 30.5, 22.1] of 3967.5
Codons List: [ATG, CCA, ACA, TGG, ATG, CCC, GAT, ATG, GAT, TGA]
Is Protein?: YES

Region Name: bogus protein
Nucleotides: CCATT-AATGATCA-CAGTT
Nuc. Counts: [6, 4, 2, 6]
Total Mass%: [32.3, 17.7, 12.1, 29.9] of 2508.1
Codons List: [CCA, TTA, ATG, ATC, ACA, GTT]
Is Protein?: NO

Region Name: michael jordan mad hops protein
Nucleotides: ATGAG-ATC-CGTGATGTGGG-AT-CCTA-CT-CATTAA
Nuc. Counts: [9, 6, 8, 10]
Total Mass%: [24.6, 13.5, 24.5, 25.3] of 4942.9
Codons List: [ATG, AGA, TCC, GTG, ATG, TGG, GAT, CCT, ACT, CAT, TAA]
Is Protein?: YES

Region Name: paris hilton phony protein
Nucleotides: ATGC-CAACATGGATGCCCTAAG-ATATGGATTAGTGA
Nuc. Counts: [12, 6, 9, 9]
Total Mass%: [32.6, 13.4, 27.3, 22.6] of 4974.3
Codons List: [ATG, CCA, ACA, TGG, ATG, CCC, TAA, GAT, ATG, GAT, TAG, TGA]
Is Protein?: YES

Region Name: jimi hendrix guitar talent protein
Nucleotides: ATGATAATTAGTTTTAATATCAGA-CTGTAA
Nuc. Counts: [12, 2, 4, 12]
Total Mass%: [40.0, 5.5, 14.9, 37.1] of 4049.5
Codons List: [ATG, ATA, ATT, AGT, TTT, AAT, ATC, AGA, CTG, TAA]
Is Protein?: NO

Region Name: admiral grace murray hopper protein
Nucleotides: ATGC-AATT--GC-----TCGA--------TTAG
Nuc. Counts: [5, 3, 4, 6]
Total Mass%: [17.0, 8.4, 15.2, 18.9] of 3964.1
Codons List: [ATG, CAA, TTG, CTC, GAT, TAG]
Is Protein?: NO

Region Name: tyler durden's brain protein
Nucleotides: ATGATACCTATGAGTAATGTGGACCATATCCAAACTATAGGCATTGTCGGACCAACGATCGATTGGTTATACTGA
Nuc. Counts: [24, 14, 16, 21]
Total Mass%: [32.9, 15.8, 24.6, 26.7] of 9843.8
Codons List: [ATG, ATA, CCT, ATG, AGT, AAT, GTG, GAC, CAT, ATC, CAA, ACT, ATA, GGC, ATT, GTC, GGA, CCA, ACG, ATC, GAT, TGG, TTA, TAC, TGA]
Is Protein?: YES

Region Name: mini me growth hormone
Nucleotides: ATGGGACGCTGA
Nuc. Counts: [3, 2, 5, 2]
Total Mass%: [24.8, 13.6, 46.3, 15.3] of 1633.4
Codons List: [ATG, GGA, CGC, TGA]
Is Protein?: NO

Region Name: Nyan Cat protein
Nucleotides: CAT-CAT-CAT-CAT-CAT-CAT-CAT-CAT-CAT-CAT
Nuc. Counts: [10, 10, 0, 10]
Total Mass%: [29.3, 24.1, 0.0, 27.1] of 4613.4
Codons List: [CAT, CAT, CAT, CAT, CAT, CAT, CAT, CAT, CAT, CAT]
Is Protein?: NO

output_ecoli.txt

Region Name: thr operon leader peptide
Nucleotides: ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA
Nuc. Counts: [21, 22, 12, 11]
Total Mass%: [33.5, 28.9, 21.4, 16.2] of 8471.7
Codons List: [ATG, AAA, CGC, ATT, AGC, ACC, ACC, ATT, ACC, ACC, ACC, ATC, ACC, ATT, ACC, ACA, GGT, AAC, GGT, GCG, GGC, TGA]
Is Protein?: YES

outputut

Genome (.fna) file? dna.txt                                                                                                                                 
Reading .fna file...                                                                                                                                        
Protein (.ptt) file? ecoli.txt                                                                                                                              
Output file? output_dna.txt                                                                                                                                 
How many proteins (-1 for all)? -1                                                                                                                          

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote