The HTML FORM below allows the probability density functions for the two normal distributions N(μ1,σ1) & N(μ2,σ2), scaled by p & 1-p respectively, to be plotted with their mixture in the ratio p:(1-p).
You can vary μ1, σ1, μ2,
σ2, & p
(also the bounds on the axes of the graph) and
press the ``
The above is a very simple example. It is possible to have mixtures of more than two component classes, mixtures of multi-variate distributions, and mixtures of different kinds of distribution, etc..
Given S things, each thing having D attributes (measurements), a mixture model attempts to describe the things as coming from a mixture of T classes (clusters):
The following are assumed to be common knowledge: The number of things, S, the number of attributes, D, the nature of each attribute.
Number of Classes
The number of classes can be coded in any suitable code for (smallish) integers.
This specifies a code word for each class. The choice of classes is described by a multistate distribution.
Each class distribution is defined by the distribution parameters
for those attributes that are important to it,
The discussion above assumed that each thing
was assigned wholly to one class or another.
If two or more closes overlap strongly,
Nuisance Parameters in Definite Assignment
e.g. Consider a 50:50 mixture of C0=N(-1,1) and C1=N(1,1) and a thing, ti=0.0. Now, ti could be in either class, and we do not care which. However with definite assignment, as above, we are forced to specify that ti is in class C0, or that it is in class C1, at a cost of 1-bit. Because of the position of ti the subsequent cost of stating its one attribute is the same in either case.
The are actually two alternative (sub-)hypotheses here: H0 that ti is in C0, and H1 that ti is in C1. Since we do not care about H0 v. H1, we should add their probabilities together. This shortens the message length, and similar considerations apply to every thing, tj, that could be in more than one class.
Class memberships have typical characteristics
of nuisance parameters:
Their number increases in proportion with the amount of data.
If classes are close enough, then regardless of the amount of data,
an inference method which uses definite assignment
(such as the 1968 Snob)
will not be able to detect the separate classes.
There are two ways to look at a method of coding the things efficiently.
The first view is to "borrow" bits from later in the message. The transmitter considers the code for things ti+1,... . If this starts with a `0', ti is coded as being in class C0 otherwise C1. Either way, the receiver decodes ti, then considers the fact that the transmitter had placed it in Ci, where i=0 or 1, and therefore understand i to be the first bit (which need not therefore be transmitted explicitly) of the rest of the message. Thus a bit is saved.
The second view of the matter is to consider
the distributions for C0, C1, and their mixture.
Thing ti has some probability, p, under class C0.
Because of the form of this example, ti also has
probability p under C1.
It therefore has probability p+p=2p under the mixture; consider code lengths,
We have been using an example where thing ti has equal probability of coming from C0 and C1. This was only to keep the arithmetic simple. Similar considerations apply when a thing is not exactly mid-way between classes, and when there are two or more attributes, three or more classes, etc..
Benefits of Fractional Assignment.
Using fractional assignment of things to classes, and given enough data, it is possible to distinguish, i.e. infer, classes that are arbitrarily close together, and even classes that have the same mean but different variances. Infered class distribution parameters are also unbiased.