The 2nd (data) part of message,
-ln f(x|theta)
Define:
d^{2} F(theta) = E_{x}( -------- -ln f(x|theta) ) d theta^{2}NB.
f(x|theta) = P(x|theta)
,
i.e. prob' of data given theta, for discrete data.E_{x}
, expectation.The first (estimate) part of message:
h(theta)
is prior probability density
function of theta.
State theta to ±s/2, assume s small, & h(theta) does not vary much over [theta-s/2, theta+s/2]
- ln( h(theta).s ) nits
theta' = theta + t, where -s/2<t<s/2
- ln f(x|theta') = - ln f(x|theta + t) d = -ln f(x|theta) +t -------(-ln f(x|theta)) d theta 1 d^{2} + - t^{2} --------(-ln f(x|theta)) + ... 2 d theta^{2}
s^{2} d^{2} -ln f(x|theta) + --.--------( -ln f(x|theta)) 24 d theta^{2}
Add two parts of message together:
- ln(h(theta).s) - ln f(x|theta) s^{2} d^{2} + --.--------( -ln f(x|theta)) 24 d theta^{2}differentiate w.r.t. s and set to zero
s^{2} = 12 / F(x,theta) d^{2} where F(x,theta) = -------- -ln( f(x|theta)) d theta^{2}
(NB. F(x,theta) is not F(theta), but the two are related...)
But this depends on x, which the receiver does not know.S^{2} = 12/( E_{x} f(x|theta).F(x,theta) ) = 12/F(theta)as x ranges over the data-space X. Both transmitter and receiver can evaluate F(theta).
1 - ln h(theta) -ln f(x|theta) + -ln F(theta) 2 1 1 F(x,theta) - -ln 12 + -.---------- 2 2 F(theta)"what is usually done is to replace the last term [...] by 1/2" (-Farr 1999 p41), a reasonable approximation if F(x,theta)-F(theta) is small over [theta-s/2, theta+s/2].
1 - ln h(theta) - ln f(x|theta) + -ln F(theta) 2 1 1 - -ln 12 + - 2 2
theta = <theta_{1}, theta_{2}, ..., theta_{n},>
F(x,theta)_{ij} d^{2} = -----------------( -ln f(x|theta)) d theta_{i} d theta_{j} F(theta) = SUM_{x:X} f(x|theta).F(x,theta)F(x,theta) and F(theta) are n×n matrices.
1 -ln h(theta) - ln f(x|theta) + -ln F(theta) 2 + -(1 + ln(k_{n})) nitswhere
k_{n}
are lattice constants
(re partitioning parameter space),
k_{1}=1/12
and
k_{n}->1/(2 pi e) = 0.0585498
n->infinity
(Farr 1999 p43).
Strict MML (SMML) makes no simplifying assumptions, but may be mathematically and algorithmically difficult.
Some sources: