Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

write a bash script that has at least two arguments on the command line. The fir

ID: 3870561 • Letter: W

Question

write a bash script that has at least two arguments on the command line. The first argument is the pathname of a file containing a valid DNA string with no newline characters or white space characters of any kind within it. (It will be terminated with a newline character.) File contains nothing but a sequence of the letters a, c, g, and t. The remaining arguments are strings containing only the bases a, c, g, and t in any order. For each valid argument string, it will search DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string. To make sure you understand what non-overlapping means, the string ata occurs just once in the string atata, not twice, because the two occurrences overlap. If your script is called correctly, it will output for each argument a line containing the argument string followed by how many times it occurs in the string. If it nds no occurrences, it should output 0 as a count.

Explanation / Answer


dnaString="aaabbaa"
subString="aa"

# funtion to find the index of a substring
strindex() {
x="${1%%$2*}"
[[ "$x" = "$1" ]] && echo -1 || echo ${#x}
}
occurences () {
    dnaStr=$1 #aaabbaa
    subStr=$2 #aa
    lenOfSub=${#subStr} #2
    count=0
    offset=0
    index=0
    # substring after offset value
    tempStr=${dnaStr:offset} #abbaa
    index=$(strindex "$tempStr" "$subStr") # in abbaa substring 'aa' occurs at 3rd position so...index = 3
    index=`expr $index + $offset` # here we have to add the offset for original position ..so index + offset = 3 + 2 =5
    while [ $index -ne -1 ]
    do
        if [ $index -ge ${#dnaStr} ]
        then
            break
        fi
        offset=`expr $index + $lenOfSub` # if we found the substring, we have to increase the offset value to find next occurance
        ((count++)) # calculate count value
        tempStr=${dnaStr:offset}
        index=$(strindex "$tempStr" "$subStr")
        if [ $index -eq -1 ]
        then
            break
        fi
        index=`expr $index + $offset`
    done
    echo $count
}
occurences $dnaString $subString