filler
^up^ [01] >>

Fisher Information

Online at   http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/   including links to other resources.

filler
<< [02] >>

Fisher (one parameter)

The 2nd (data) part of message, -ln f(x|theta).

Define:

                  d2
F(theta) = Ex( --------  -ln f(x|theta) )
               d theta2
NB. f(x|theta) = P(x|theta), i.e. prob' of data given theta, for discrete data.
NB. Ex, expectation.

filler
<< [03] >>

Prior

The first (estimate) part of message:

h(theta) is prior probability density function of theta.

State theta to ±s/2, assume s small, & h(theta) does not vary much over [theta-s/2, theta+s/2]

Cost to state estimate is:
- ln( h(theta).s ) nits
NB. Integral of h(theta)=1.

filler
<< [04] >>

Rules of the game


filler
<< [05] >>

theta' = theta + t, where -s/2<t<s/2

  - ln f(x|theta')

= - ln f(x|theta + t)

                       d
= -ln f(x|theta) +t -------(-ln f(x|theta))
                    d theta

    1      d2
  + - t2 --------(-ln f(x|theta)) + ...
    2    d theta2
-- Taylor expansion

filler
<< [06] >>
Average over [-s/2, s/2],
linear term vanishes
integral of t2 over [-s/2,s/2] is s3/12
 
average is:
                 s2    d2
-ln f(x|theta) + --.--------( -ln f(x|theta))
                 24 d theta2

filler
<< [07] >>

Precision, s

Add two parts of message together:


- ln(h(theta).s) - ln f(x|theta)

  s2    d2
+ --.--------( -ln f(x|theta))
  24 d theta2
differentiate w.r.t. s and set to zero

filler
<< [08] >>
s2 = 12 / F(x,theta)

                      d2
where F(x,theta) = -------- -ln( f(x|theta))
                   d theta2

(NB. F(x,theta) is not F(theta), but the two are related...)

But this depends on x, which the receiver does not know.
Have to use the expected quantity.
S2 = 12/( Ex f(x|theta).F(x,theta) )

   = 12/F(theta)
as x ranges over the data-space X. Both transmitter and receiver can evaluate F(theta).

filler
<< [09] >>

The Message Length

                               1
- ln h(theta) -ln f(x|theta) + -ln F(theta)
                               2
  1        1 F(x,theta)
- -ln 12 + -.----------
  2        2  F(theta)
"what is usually done is to replace the last term [...] by 1/2" (-Farr 1999 p41), a reasonable approximation if F(x,theta)-F(theta) is small over [theta-s/2, theta+s/2].

msgLen ~
                                1
- ln h(theta) - ln f(x|theta) + -ln F(theta)
                                2
                1        1
              - -ln 12 + -
                2        2

filler
<< [10] >>

Fisher (multiple parameters)

theta = <theta1, theta2, ..., thetan,>

F(x,theta)ij

       d2
= -----------------( -ln f(x|theta))
  d thetai d thetaj


F(theta) = SUMx:X f(x|theta).F(x,theta)
F(x,theta) and F(theta) are n×n matrices.
The Fisher information is the determinant of F(theta).

filler
<< [11] >>
msgLen ~
                               1
-ln h(theta) - ln f(x|theta) + -ln F(theta)
                               2

             + -(1 + ln(kn)) nits
where kn are lattice constants (re partitioning parameter space), k1=1/12 and kn->1/(2 pi e) = 0.0585498. as n->infinity (Farr 1999 p43).
filler
<< [12]

Summary

General form for NB. An approximation; it may break down if

Strict MML (SMML) makes no simplifying assumptions, but may be mathematically and algorithmically difficult.

Some sources:

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.