The 2nd (data) part of message,
-ln f(x|theta)
Define:
d2 F(theta) = Ex( -------- -ln f(x|theta) ) d theta2NB.
f(x|theta) = P(x|theta)
,
i.e. prob' of data given theta, for discrete data.Ex
, expectation.The first (estimate) part of message:
h(theta)
is prior probability density
function of theta.
State theta to ±s/2, assume s small, & h(theta) does not vary much over [theta-s/2, theta+s/2]
- ln( h(theta).s ) nits
theta' = theta + t, where -s/2<t<s/2
- ln f(x|theta') = - ln f(x|theta + t) d = -ln f(x|theta) +t -------(-ln f(x|theta)) d theta 1 d2 + - t2 --------(-ln f(x|theta)) + ... 2 d theta2
s2 d2 -ln f(x|theta) + --.--------( -ln f(x|theta)) 24 d theta2
Add two parts of message together:
- ln(h(theta).s) - ln f(x|theta) s2 d2 + --.--------( -ln f(x|theta)) 24 d theta2differentiate w.r.t. s and set to zero
s2 = 12 / F(x,theta) d2 where F(x,theta) = -------- -ln( f(x|theta)) d theta2
(NB. F(x,theta) is not F(theta), but the two are related...)
But this depends on x, which the receiver does not know.S2 = 12/( Ex f(x|theta).F(x,theta) ) = 12/F(theta)as x ranges over the data-space X. Both transmitter and receiver can evaluate F(theta).
1 - ln h(theta) -ln f(x|theta) + -ln F(theta) 2 1 1 F(x,theta) - -ln 12 + -.---------- 2 2 F(theta)"what is usually done is to replace the last term [...] by 1/2" (-Farr 1999 p41), a reasonable approximation if F(x,theta)-F(theta) is small over [theta-s/2, theta+s/2].
1 - ln h(theta) - ln f(x|theta) + -ln F(theta) 2 1 1 - -ln 12 + - 2 2
theta = <theta1, theta2, ..., thetan,>
F(x,theta)ij d2 = -----------------( -ln f(x|theta)) d thetai d thetaj F(theta) = SUMx:X f(x|theta).F(x,theta)F(x,theta) and F(theta) are n×n matrices.
1 -ln h(theta) - ln f(x|theta) + -ln F(theta) 2 + -(1 + ln(kn)) nitswhere
kn
are lattice constants
(re partitioning parameter space),
k1=1/12
and
kn->1/(2 pi e) = 0.0585498
n->infinity
(Farr 1999 p43).
Strict MML (SMML) makes no simplifying assumptions, but may be mathematically and algorithmically difficult.
Some sources: