filler
[01] >>

(DNA) Sequence Complexity and Alignment.

Bioinformatics Seminar: room B/C, Walter & Eliza Hall Inst. (WEHI), Parkville, 11am Tuesday 13 February 2001

Debts, in alphabetical order, to: Trevor Dix, David Dowe, Tim Edgoose, Dave Powell, Linda Stern, Chris Wallace, Chut N. Yee.

Lloyd Allison, Computer Science and Software Engineering, Monash University, Victoria, Australia 3168

http://www.csse.monash.edu.au/~lloyd/tildeStrings/



This document is online at   users.monash.edu/~lloyd/Seminars/2001-WEHI/index.shtml   and contains hyper-links to more detailed notes on certain topics and to some of the references.
 
Also see Modelling v. Shuffling.

filler
<< [02] >>

The story...


filler
<< [03] >>
Probabilistic Finite State Automaton PFSA c1983, AKA Hidden Markov Model HMM, TR32

A very (!) old problem:

Learn a language.


filler
<< [04] >>
the Rev

Model Complexity:


filler
<< [05] >>
Alignment, [c1990]:

Mutation Machine

Probabilistic Finite State Automaton mutation machine c1990, AKA Pair Hidden Markov Model PHMM

Mutation instructions: copy, change, insert, delete.

filler
<< [06] >>

Equivalent Generation Machine:

Probabilistic Finite State Automaton PFSA generation machine c1990, AKA Pair Hidden Markov Model PHMM

Generation instructions: match, mismatch, insert1, insert2.
(A.K.A. Pair Hidden Markov Models (PHMM).)

filler
<< [07] >>
What matters most is the "control box" (model):
filler
<< [08] >>
A twist: Sum over all Alignments, fwd+back, alignment density:
Sequence Alignment Probability Density Plot, cf Pair Hidden Markov Model PHMM HMM, of chicken hemoglobin alpha and beta chains CHKHBAM and CHKHBBM as in Waterman(1984, p484)

filler
<< [09] >>

Compression   ~   Sequence Complexity:

Compression Machine Probabilistic Finite State Automaton PFSA c2000, AKA Hidden Markov Model HMM
(Also reverse-complementary approximate repeats.)

filler
<< [10] >>

What matters most is the control box, e.g.

Probabilistic Compression Approximate Repeats Control Machine c1998, AKA Hidden Markov Model HMM

filler
<< [11] >>
2-D Plot from Compression Algorithm.
2D Compression Approximate Repeats Probability Density Plot, n.b. not a DOTTER style dot-plot matrix, also see 1-D plot

p.falciparum, chr2, 0.95Mbp

[Zoom 30-70K]

[colour]

[b+w]

[1-D plot] bits/b.p.

NB. Same twist - sum over all explanations. But not enough space for fwd+rev for long Seq'.


filler
<< [12] >>

Compression and Alignment.


filler
<< [13] >>

  Seq_1 . . . S1[i] . . .
S
e
q
2
.
       
.
.
.
 

.      
  (mis)match .    
    .  
      .
 |
 |
 | insert
 |
 |
 v
 
S2[j]  
 
 
 
 
------->
delete  
?  
 
.
.
.
 
       

filler
<< [14] >>
To combine the sources of information, e.g. can
(a) build "direct product" of sub-models, Powell et al (1998), or
(b) say, use this approximation: . . .

Instruction probabilities   (costs = -log2( . )): See [Comp. Jrnl. (1999)]
filler
<< [15]

Results:

Some References


© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3168.
Created with "vi (Linux & IRIX)",   charset=iso-8859-1