Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I am trying to calculate a log-odds matrix for MAST input, from a position-speci

ID: 32113 • Letter: I

Question

I am trying to calculate a log-odds matrix for MAST input, from a position-specific probability matrix for the motif in which I am interested.

I would like to know how MEME estimates the background frequency of nucleotides, as it does the conversion from position-specific probability matrices to log-odds matrices when you choose to run MAST on MEME output. Is it simply counting frequencies in the sequences supplied, or is there some sort of modeling going on to correct for sample size and whatnot?

Explanation / Answer

At the MEME server page, there's a link to upload a customized background markov model (using the command line interface, this is the -bfile option). From there, there's a link to the MEME Man Page. Under "Objective Function", it specifies:

The background model is an n-order Markov model. By default, it is a 0-order model consisting of the frequencies of the letters in the training set.

So yes, it's basically the simplest possible correction: no accounting for pairwise frequencies, complements, motif width, etc. I expect this is because MEME can be applied to essentially any dataset, such as phage display bindings from a "truly" random set of short oligos. In which case making higher order assumptions about the pairwise independence would be detrimental.

Below that, I think it answers your question about the total log-odds calculation:

The E-value reported by MEME is actually an approximation of the E-value of the log likelihood ratio. (An approximation is used because it is far more efficient to compute.) The approximation is based on the fact that the log likelihood ratio of a motif is the sum of the log likelihood ratios of each column of the motif. Instead of computing the statistical significance of this sum (its p-value), MEME computes the p-value of each column and then computes the significance of their product. Although not identical to the significance of the log likelihood ratio, this easier to compute objective function works very similarly in practice.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote