Build your own Hidden Markov Model So far we have discussed Markov Chain and its

ID: 3602323 • Letter: B

Question

Build your own Hidden Markov Model

So far we have discussed Markov Chain and its relation with HMM. Now it’s time to construct your own HMM to solve specific problems.

Scenario:

Suppose you are a bioinformatist working in a microbiology research group. The microbiologists in your lab identified new bacteria that possibly causes stomach cancer. Your job is to annotate the genome of this new bacteria. The first step of genome annotation is to identify the coding- and non-coding regions. You decide to achieve that by constructing an HMM model.

Construct an HMM model by answering following questions:

What are the hidden states? What is the observation?

Hidden state: The genome (nucleotides) that causes stomach cancer.

Observable state: The gene expression. (cancer/ no cancer)

In the Urn and Ball problem, each hidden state (urn) is linked with one observation (a ball). If you apply the same rule to this problem, you may find every neighboring nucleotide coming from a different hidden state. It is obviously a wrong assumption for genome. Therefore, you need to link one hidden state to a window of nucleotides.

To simplify the problem, let’s assume that you have in total K windows of length L (), and the position of each ( is fixed on the genome. Consequently, you will have a sequence of hidden states .

Draw the hidden states and nucleotide windows as the Urn and Ball example in Wiki (https://en.wikipedia.org/wiki/Hidden_Markov_model#A_concrete_example)

In Urn and Ball example, the hidden states and observation is connected by probability, e.g. the probability of ball drawn from urn . How would you connect the hidden states and observations in this problem? (Hint: as we spent so much time discussing Markov Chain, it shouldn’t be hard for you to guess the answer.)

Observe your graph. How many Markov Chains are needed in this problem? What are they? (Hint: neighboring nucleotide windows may be from different hidden states.

How would you model the relationship of the neighboring hidden states? Are they independent from each other?)

Assume you will use first-order Markov Chains. Describe your strategy of calculating the transition matrix and initial probability. (Hint: Obviously, you need the data to calculate these parameters. In the previous exercise, you calculated the transition matrix using a given DNA sequence (Q5 on Wednesday). However, this strategy will not work for this problem as you don’t know where the coding regions are. Therefore, what data do you want to use? Where can you find the data?)

Now you have set up your HMM. It is obvious that for each window , you will have several options of hidden states. The amount of computation is enormous if using a brute-force strategy (Does this problem sound familiar to you?). However, statisticians developed a really smart method called Viterbi algorithm. It is based on Dynamic Programming (our old friend!)

Review what you’ve learnt about dynamic programming in the global alignment problem:

for each cell, the score is the max of () – gap penalty, score() – gap penalty, score() – mismatch penalty or score() + matching reward). The backtracking direction is determined accordingly. The score for the base case (the very top left cell in the grids) is 0.

Fill the following table to adapt HMM into a dynamic programming problem:

What are the counterparts?

Global alignment

You are interested in probabilities rather than the scores. How would you apply the same rule to HMM? Let’s work on the probability for base case (i.e. the beginning window ). What parameters do you need? How to connect the next case with base case ?

Now you have an HMM model. Remember that gene structure is complicated, as we discussed it in Unit 1. Do you think this model is good in identifying genes? If not, what changes do you want to make? (Hint: the setting of HMM is adjustable, e.g. the number of hidden states, length of the nucleotide window, positions of the window, Markov Chains, etc.)

What are the counterparts?

Global alignment

Explanation / Answer

The hidden Markov model is one of the most important machine learning models

in speech and language processing. To define it properly, we need to first introduce

the

Markov chain

, sometimes called the

observed Markov model

. Markov chains

and hidden Markov models are both extensions of the finite automata of Chapter 3.

Recall that a

weighted finite automaton

is defined by a set of states and a set of

transitions between states, with each arc associated with a weight. A

Markov chain

is a special case of a weighted automaton in which weights are probabilities (the

probabilities on all arcs leaving a node must sum to 1) and in which the input se-

quence uniquely determines which states the automaton will go through. Because

it can’t represent inherently ambiguous problems, a Markov chain is only useful for

assigning probabilities to unambiguous sequences

A Markov chain is useful when we need to compute a probability for a sequence

of events that we can observe in the world. In many cases, however, the events

we are interested in may not be directly observable in the world. For example, in

Chapter 10we’ll introduce the task of part-of-speech tagging, assigning tags like

Noun and Verb to words.

we didn’t observe part-of-speech tags in the world; we saw words and had to in-

fer the correct tags from the word sequence. We call the part-of-speech tags

hidden

because they are not observed

Navigate

Build with Node.js Web Servers Please answer as soon as possible Follow the inst

Build “Date Picker” in Android studio. It should allow you to select day, month

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Build your own Hidden Markov Model So far we have discussed Markov Chain and its

Question

Explanation / Answer

Related Questions

Navigate