Up to MML page.

# Data, Parameters, Models, Families.

Q: What is the difference between a hypothesis and a theory?
A: Think of a hypothesis as a card. A theory is a house made of hypotheses.
(From rec.humor.funny, attributed to Marilyn vos Savant.)

## Parameter Estimation.

Given a model, the parameter-estimation problem is to find the value of the parameter (of the model) that best describes some data D. The maximum-likelihood approach is to use the value p which maximises P(D|p). It thus ignores the cost of stating the value of the parameter and the accuracy to which it can be inferred; this is only safe if the cost is the same for all values.

The MML approach is to maximise P(p).P(D|p), or equivalently to minimise MsgLen(p)+MsgLen(D|p). The parameter can only be stated to a finite accuracy. A prior on p must also be considered; note that any code for p implies a prior.

## Model Selection.

Given a family F of models, m, m, ..., the model-selection problem is to select the particular model that best describes some data D. The MML approach is to select the model that minimises MsgLen(m[i])+MsgLen(D|m[i]). It is necessary to consider a prior in i.

It is likely that a model m[i] will have one or more parameters p. If the aim is to find the best m[i] (regardless of parameter value), the MML approach maximises the integral over p of P(m[i]).P(p|m[i]).P(D|m[i],p), which is the sum over all ways in which m[i] can explain D.

There is a slightly different problem that is sometimes confused with model selection: If the aim is to find the best m[i] together with p then we maximise P(m[i]).P(p|m[i]).P(D|m[i],p) rather than the integral.

It is not that one of these two problem is the "right" one - they are different problems and have different forms of answer.

If we fix on a particular model m[i] and wish to estimate its parameter p then P(m[i]) and MsgLen(m[i]) become constants and can be ignored - we are in the parameter estimation problem.

## Family Selection.

Given a collection C of families of models, F, F, ... the family-selection problem is to select the particular family that best describes some data D. For example, F might be decision trees, F might be logic programs etc. The MML approach is to minimise MsgLen(F[i])+MsgLen(D|F[i]).

The reader will notice that the selection of model m[j] within family F[i] is equivalent to the estimation of a parameter (j) within a model (F[i]). There is no logical difference although terms such as "model" and "family" are very useful props for people. This leads us to consider the series: data, parameter, model, family, collection ... and whether it has a limit. There is a good candidate for the limit - a universal Turing machine (UTM). In fact it makes a lot of sense to start at this point and consider collections, families, models, parameters and data to be specializations of a UTM, some no longer universal.

The choice of a problem, say the family-selection problem, seems somewhat arbitrary when seen in this light.