Gene Finding With Hidden Markov ModelsSeminar by Marina Alexandersson
[email]
|
Notes taken for the Monash CSSE Bioinformatics group by L.A. M.A. didn't give any actual % success rates etc. Did not seem to use any numerical measure of model complexity. HMM example based on two (the hidden variable) dice. Intro' to genes, exons, introns [here]. |
Splice Site Prediction (for intron editing out)
Exon Length:
|
Positions considered independent - i.e. a "profile" or "block". You might use mixtures of geometric distributions to flatten out a distribution, but they just don't do peaked distributions. Pity - because geometric d's give linear -log (cost), which has some algorithmic advantages in DPAs.-( |
Generalised Hidden Markov Models (GPHMM)
|
I'm not sure why this was called "generalised". Interesting to compare architecture with Glimmer / GlimmerM etc. |
Pair Hidden Markov Models (PHMM) i.e. alignment<--| --------------> X---| | ---<---->--| | | begin ---> M -----------------> end | | | ---<---->--| --------------> Y---| <--| M - match |
Of course, 3-states for linear gap costs.
It looked better in powerpoint than ascii art,
but was topologically similar
to the 3-state mutation and generation machines |
Algorithms
| |
Viterbi on Half-Phat |
Showed lattice of states and finding most probably path. |
Generalised Pair Hidden Markov Models (GPHMM)? ~ product of Half_Phat x alignment model, too big to draw. Double-Phat - two sequence alignment under a gene model.Model Time Space HMM N2.T NT PHMM N2.T.U N.T.U (2 sequences) GHMM D2.N2.T NT GPHMM D4.N2.T.U N.T.U (2 sequences) where N = # of states, D = max exon length, T=|seq1|, U=|seq2|. Speed up for GPHMM - get quick alignment, put window around it, work in that area. |
Seemed to be related to direct product of
the sequence (gene) machine (model) and
the alignment machine (model).
Interesting to c.f. with: |
Could enrich the upstream model, i.e. do something with promoters.
Acknowledged: Simon Cawley, Lior Pachter, Terry Speed |
Nice talk. |