ABSTRACT
We present a first step towards a general methodology for
statistical inference in the Minimum Message Length (MML) framework using
Markov Chain Monte Carlo (MCMC) methods.
The Minimum Message Length concept of a two-part message is used to partition
a sample from an importance sampling density of the model parameters into
coding regions. The regions contain models that are similar and hence
the method can be loosely described as model clustering. For
each region a point estimate is chosen as the representative model and we
use Dowe's recent MMLD approximation as the message length criterion.
The method is made practical by using the MML Boundary Rule and importance
sampling. The MML Boundary Rule is a result that applies to MML approximations
in a common form and which states that the boundary of the optimal coding
region will be isometric or approximately isometric. This means that
we need only consider regions with isometric boundaries. Importance
sampling is used to efficiently evaluate expectations and integrals,
using Monte Carlo integration, over promising isometrically bounded regions
of the parameter space. We note that the regions do not have to be
restricted to contigious (simply connected) regions of the parameter space
and therefore efficient codebooks can be approximated.
Our approach can, in theory, be used to approximate message lengths for any
problem where the prior probability density and likelihood function are known
and a suitable sampler exists. We treat the posterior as a black-box
and make no assumptions about its mathematical form or the structure of its
parameter space. The method requires a sample from a suitable importance
sampling density and a means of approximating the Kullback-Leibler distance
between any two models. For general use, we suggest that the posterior
is suitable for the importance sampling. In this case the Message from
Monte Carlo methodology becomes aligned with Bayesian posterior sampling
and can be considered as a means of introspection into the posterior sample.
We give an example of the method applied to univariate polynomial regression.
Polynomial regression was chosen because it is a useful and intuitive
problem for which the MML87 method can and already has been applied. We
use the posterior as an importance sampling density and the polynomial parameters
are sampled using Reversible Jump Markov Chain Monte Carlo.