Mixtures 

The HTML FORM below allows the probability density functions for the two normal distributions N(μ1,σ1) & N(μ2,σ2), scaled by p & 1p respectively, to be plotted with their mixture in the ratio p:(1p). You can vary μ1, σ1, μ2,
σ2, & p
(also the bounds on the axes of the graph) and
press the `` The above is a very simple example. It is possible to have mixtures of more than two component classes, mixtures of multivariate distributions, and mixtures of different kinds of distribution, etc.. The ModelGiven S things, each thing having D attributes (measurements), a mixture model attempts to describe the things as coming from a mixture of T classes (clusters):
The following are assumed to be common knowledge: The number of things, S, the number of attributes, D, the nature of each attribute. Number of ClassesThe number of classes can be coded in any suitable code for (smallish) integers. Class DictionaryThis specifies a code word for each class. The choice of classes is described by a multistate distribution. Class Distributions.Each class distribution is defined by the distribution parameters
for those attributes that are important to it,
Fractional AssignmentThe discussion above assumed that each thing
was assigned wholly to one class or another.
If two or more closes overlap strongly,
Nuisance Parameters in Definite Assignmente.g. Consider a 50:50 mixture of C_{0}=N(1,1) and C_{1}=N(1,1) and a thing, t_{i}=0.0. Now, t_{i} could be in either class, and we do not care which. However with definite assignment, as above, we are forced to specify that t_{i} is in class C_{0}, or that it is in class C_{1}, at a cost of 1bit. Because of the position of t_{i} the subsequent cost of stating its one attribute is the same in either case. The are actually two alternative (sub)hypotheses here: H_{0} that t_{i} is in C_{0}, and H_{1} that t_{i} is in C_{1}. Since we do not care about H_{0} v. H_{1}, we should add their probabilities together. This shortens the message length, and similar considerations apply to every thing, t_{j}, that could be in more than one class. Class memberships have typical characteristics
of nuisance parameters:
Their number increases in proportion with the amount of data.
If classes are close enough, then regardless of the amount of data,
an inference method which uses definite assignment
(such as the 1968 Snob)
will not be able to detect the separate classes.
NoNuisance CodingThere are two ways to look at a method of coding the things efficiently. The first view is to "borrow" bits from later in the message. The transmitter considers the code for things t_{i+1},... . If this starts with a `0', t_{i} is coded as being in class C_{0} otherwise C_{1}. Either way, the receiver decodes t_{i}, then considers the fact that the transmitter had placed it in C_{i}, where i=0 or 1, and therefore understand i to be the first bit (which need not therefore be transmitted explicitly) of the rest of the message. Thus a bit is saved. The second view of the matter is to consider
the distributions for C_{0}, C_{1}, and their mixture.
Thing t_{i} has some probability, p, under class C_{0}.
Because of the form of this example, t_{i} also has
probability p under C_{1}.
It therefore has probability p+p=2p under the mixture; consider code lengths,
GeneralisationWe have been using an example where thing t_{i} has equal probability of coming from C_{0} and C_{1}. This was only to keep the arithmetic simple. Similar considerations apply when a thing is not exactly midway between classes, and when there are two or more attributes, three or more classes, etc.. Benefits of Fractional Assignment.Using fractional assignment of things to classes, and given enough data, it is possible to distinguish, i.e. infer, classes that are arbitrarily close together, and even classes that have the same mean but different variances. Infered class distribution parameters are also unbiased. Notes


