Up to MML page.

# Data, Parameters, Models, Families.

Q: What is the difference between a hypothesis and a theory?

A: Think of a hypothesis as a card. A theory is a house made of hypotheses.

(From rec.humor.funny, attributed to Marilyn vos Savant.)

## Parameter Estimation.

Given a model, the parameter-estimation problem is to find the value
of the parameter (of the model) that best describes some data D.
The *maximum-likelihood* approach is to use the value p which maximises
P(D|p).
It thus ignores the cost of stating the value of the parameter
and the accuracy to which it can be inferred;
this is only safe if the cost is the same for all values.
The MML approach is to maximise P(p).P(D|p),
or equivalently to minimise MsgLen(p)+MsgLen(D|p).
The parameter can only be stated to a finite accuracy.
A prior on p must also be considered;
note that any code for p implies a prior.

## Model Selection.

Given a family F of models, m[1], m[2], ...,
the model-selection problem is to select the particular model
that best describes some data D.
The MML approach is to select the model that minimises
MsgLen(m[i])+MsgLen(D|m[i]).
It is necessary to consider a prior in i.
It is likely that a model m[i] will have one or more parameters p.
If the aim is to find the best m[i]
(regardless of parameter value),
the MML approach maximises the integral over p of
P(m[i]).P(p|m[i]).P(D|m[i],p),
which is the sum over all ways in which m[i] can explain D.

There is a slightly different problem that is sometimes
confused with model selection:
If the aim is to find the best m[i] *together* with p then
we maximise P(m[i]).P(p|m[i]).P(D|m[i],p) rather than the integral.

It is not that one of these two problem is the "right" one -
they are different problems and have different forms of answer.

If we fix on a particular model m[i] and wish to estimate its parameter p
then P(m[i]) and MsgLen(m[i]) become constants and can be ignored -
we are in the parameter estimation problem.

## Family Selection.

Given a collection C of families of models, F[1], F[2], ...
the family-selection problem is to select the particular family
that best describes some data D.
For example, F[1] might be decision trees, F[2] might be logic programs etc.
The MML approach is to minimise
MsgLen(F[i])+MsgLen(D|F[i]).
The reader will notice that the selection of model m[j] within family F[i]
is equivalent to the estimation of a parameter (j) within a model (F[i]).
There is no logical difference although terms such as "model" and "family"
are very useful props for people.
This leads us to consider the series:
*data, parameter, model, family, collection ...*
and whether it has a limit.
There is a good candidate for the limit - a universal Turing machine (UTM).
In fact it makes a lot of sense to start at this point
and consider collections, families, models, parameters and data
to be specializations of a UTM, some no longer universal.

The choice of a problem, say the family-selection problem,
seems somewhat arbitrary when seen in this light.

Copyright ©
L. Allison / 1994 - 1996
Created with "vi (Linux + IRIX)", charset=iso-8859-1