^up^ [01] >>

# Multi-state

Consider a discrete sample space of M unordered values, e.g.

• throw = {head, tail} M = 2

• base = {A, C, G, T} M = 4

• roll = {1, 2, 3, 4, 5, 6} M = 6. NB. unordered

• amino acid = {Glycine, Alanine, Valine, Isoleucine, Leucine, Phenylalanine, Proline, Methionine, Serine, Threonine, Tyrosine, Tryptophan Aspargine, Glutamine, Cysteine, Aspartic acid, Glutamic acid, Lysine, Arginine, Histidine} M = 20
and sequences of these.

This document is online at   http://www.csse.monash.edu.au/~lloyd/Archive/2005-04-Fin-state/index.shtml   and contains hyper-links to other resources.

<< [02] >>

Distribution has M-1 parameters T1, T2, ..., TM-1.   M-1 degrees of freedom.

Also define TM = 1 - T1 - T2 ... - TM-1

<< [03] >>

# Estimators

From data, observed frequencies are n1, ..., nM, let N = SUMi=1..M ni.

Maximum likelihood: Ti,ML = ni/N   what if ni=0?

Minimum Message Length: Ti,MML = (ni + 1/2)/(N + M/2)

MinEKL estimator: Ti,MinEKL = (ni + 1)/(N + M)   minimum expected Kullback Leibler

<< [04] >>

# Some uses:

• discrete sample spaces (as seen) and also

• model of the "class" attribute in supervised classification

• sub-model on 1st-order Markov model

• proportions of the classes in a mixture model (unsupervised classification)

• frequency of transitions out of a state in a Probabilistic Finite State Automaton (PFSA, hidden Markov model, HMM) . . .

<< [05] >>

Note finite number of transitions out of each state of automaton

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (IRIX)",   charset=iso-8859-1