The MML approach is to maximise P(p).P(D|p), or equivalently to minimise MsgLen(p)+MsgLen(D|p). The parameter can only be stated to a finite accuracy. A prior on p must also be considered; note that any code for p implies a prior.
It is likely that a model m[i] will have one or more parameters p. If the aim is to find the best m[i] (regardless of parameter value), the MML approach maximises the integral over p of P(m[i]).P(p|m[i]).P(D|m[i],p), which is the sum over all ways in which m[i] can explain D.
There is a slightly different problem that is sometimes confused with model selection: If the aim is to find the best m[i] together with p then we maximise P(m[i]).P(p|m[i]).P(D|m[i],p) rather than the integral.
It is not that one of these two problem is the "right" one - they are different problems and have different forms of answer.
If we fix on a particular model m[i] and wish to estimate its parameter p then P(m[i]) and MsgLen(m[i]) become constants and can be ignored - we are in the parameter estimation problem.
The reader will notice that the selection of model m[j] within family F[i] is equivalent to the estimation of a parameter (j) within a model (F[i]). There is no logical difference although terms such as "model" and "family" are very useful props for people. This leads us to consider the series: data, parameter, model, family, collection ... and whether it has a limit. There is a good candidate for the limit - a universal Turing machine (UTM). In fact it makes a lot of sense to start at this point and consider collections, families, models, parameters and data to be specializations of a UTM, some no longer universal.
The choice of a problem, say the family-selection problem, seems somewhat arbitrary when seen in this light.