^up^ [01] >>

This document is online at http://www.csse.monash.edu.au/~lloyd/Archive/2005-06-Mixture/index.shtml and contains hyper-links to other resources.

Unsupervised classification **=**
clustering **=** mixture modelling **=** . . .

Given multivariate data, i.e. S things, each having D attibutes,
*infer*:

- The number of classes (clusters),

- the parameters of each class, and

- for each thing, the class that it (probably) belongs to, but see later! [more]

<< [02] >>

<< [03] >>

Use the form above to experiment. Try

- separation greater than, equal to, and
less than, 2 standard deviations,

- equal & unequal proportions of the two Normals,

- equal means but differing variances,

- etc.

<< [04] >>

Can have mixtures of

- any
*appropriate*distribution - [Normal], [Poisson], [Multi-state], etc.,

- any number of classes, i.e. component distributions,

- heterogeneous distributions (over the same data space),
e.g. Poisson + Geometric

<< [05] >>

Bi-variate example

The<< [06] >>

- Number of classes using some suitable
[Integer]
distribution,

- the relative ambundances of each class; this is a
[Multi-state]
distribution, [1 . . #classes],

- the class parameters for each class -
e.g. mu, sigma for a
[Normal]
or whatever.

- and then the data . . .

<< [07] >>

NB. If interval is small, probability density can be approximated as a constant over it, and simply "passes through" the Maths.

<< [08] >>

<< [09] >>

membership of the left hand class is less likely but conceivable

<< [10] >>

Fractional assignment, to *both* classes, loses no probability:

<< [11] >>

The probability contributions of two classes, green and shaded:

<< [12] >>

Total assignment leads to
*biased* estimates of parameters (means, variances):

<< [13] >>

There are two ways of viewing/ implementing fractional assignment:

- Borrowed bits, Wallace (1986).

e.g. consider a case of 50:50 membership,

- work out code for
*rest*of data, - "borrow" 1st bit from rest to decide which membership choice to make,
- remove borrowed bit from rest, don't transmit it, receiver can deduce it!

Generalizes to unequal ratios and >2 classes.

- work out code for
- Directly from a code-book based on the combined, i.e. mixed, distribution.

<< [14] >>

The methods described allow two hypotheses to be compared and
the better one chosen, but there remains the **search problem** -
to find the best mixture model. `Snob' does the following:

- Have a current working hypothesis,

- iterate, re-estimating memberships & class parameters.

- Periodically, consider merging two classes, or splitting a class,
accept if
*total*message length is reduced.

- Repeat until MsgLen(Hypothesis) + MsgLen(Data|Hypothesis) stops reducing.

<< [15]

- Minimum Message Length (MML)
shows how to trade-off model complexity (# of classes) against
fit to data.

- Fractional assignment gives
*unbiased*estimates of class parameters and of # of classes (clusters) and,

- given enough data, can find true model regardless of closeness of classes, even equal means (& unequal variances).

- C. S. Wallace & D. M. Boulton.
*An Information Measure for Classification.*Computer Journal**11**(2) pp185-194 August 1968. - C. S. Wallace.
*An improved program for classification.*Proc. Australian Comp. Sci. Conf ACSC9,**8**(1), pp357-366, February 1986

- DLD: First description of the bit-borrowing coding technique, also see ICCI'90, later called bits-back by G. E. Hinton & D. van Camp in COLT '93, (COLT93) '93. - C. S. Wallace & P. R. Freeman.
*Estimation and Inference by Compact Encoding*. J. R. Stat. Soc. B**49**pp240-265 1987 [paper] - C. S. Wallace.
*Classification by minimum-message-length encoding.*Advances in Computing and Information - ICCI '90, Springer-Verlag, LNCS**468**, pp72-81, May 1990 - On the protein example:
- D. L. Dowe, L. Allison, T. I. Dix, L. Hunter, C. S. Wallace & T. Edgoose.
*Circular clustering of protein dihedral angles by minimum message length*. Pacific Symposium on Biocomputing '96 (PSB96), World Scientific, pp242-255, January 1996 - T. Edgoose, L. Allison & D. L. Dowe.
*An MML classification of protein structure that knows about angles and sequences*. Pacific Symposium on Biocomputing '98, pp585-596, January 1998 - Also see [Markov mixtures]

- D. L. Dowe, L. Allison, T. I. Dix, L. Hunter, C. S. Wallace & T. Edgoose.

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800. Created with "vi (Linux)", charset=iso-8859-1