^up^ [01] >>

# Fisher Information

• Model complexity, -log(H), depends on precision parameter(s) stated to

• general form based on Fisher information

• shows sensitivity of -log P(D|H) to estimate

• high sensitivity: [_________________________]

• low sensitivity: [_________________________]
Online at   http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/   including links to other resources.

<< [02] >>

# Fisher (one parameter)

The 2nd (data) part of message, `-ln f(x|theta)`.

Define:

```                  d2
F(theta) = Ex( --------  -ln f(x|theta) )
d theta2
```
NB. `f(x|theta) = P(x|theta)`, i.e. prob' of data given theta, for discrete data.
NB. `Ex`, expectation.

<< [03] >>

# Prior

The first (estimate) part of message:

`h(theta)` is prior probability density function of theta.

State theta to ±s/2, assume s small, & h(theta) does not vary much over [theta-s/2, theta+s/2]

Cost to state estimate is:
` - ln( h(theta).s ) nits`
NB. Integral of h(theta)=1.

<< [04] >>

# Rules of the game

• Transmitter sends an estimate theta from the code book

• estimate represents [theta-s/2, theta+s/2]

• choose theta to be optimal on average over interval

• theta' = theta + t, where -s/2<t<s/2

<< [05] >>

theta' = theta + t, where -s/2<t<s/2

```  - ln f(x|theta')

= - ln f(x|theta + t)

d
= -ln f(x|theta) +t -------(-ln f(x|theta))
d theta

1      d2
+ - t2 --------(-ln f(x|theta)) + ...
2    d theta2
```
-- Taylor expansion

<< [06] >>
Average over [-s/2, s/2],
linear term vanishes
integral of t2 over [-s/2,s/2] is s3/12

average is:
```                 s2    d2
-ln f(x|theta) + --.--------( -ln f(x|theta))
24 d theta2
```

<< [07] >>

# Precision, s

Add two parts of message together:

```
- ln(h(theta).s) - ln f(x|theta)

s2    d2
+ --.--------( -ln f(x|theta))
24 d theta2
```
differentiate w.r.t. s and set to zero

<< [08] >>
```s2 = 12 / F(x,theta)

d2
where F(x,theta) = -------- -ln( f(x|theta))
d theta2
```

(NB. F(x,theta) is not F(theta), but the two are related...)

But this depends on x, which the receiver does not know.
Have to use the expected quantity.
```S2 = 12/( Ex f(x|theta).F(x,theta) )

= 12/F(theta)
```
as x ranges over the data-space X. Both transmitter and receiver can evaluate F(theta).

<< [09] >>

# The Message Length

```                               1
- ln h(theta) -ln f(x|theta) + -ln F(theta)
2
1        1 F(x,theta)
- -ln 12 + -.----------
2        2  F(theta)
```
"what is usually done is to replace the last term [...] by 1/2" (-Farr 1999 p41), a reasonable approximation if F(x,theta)-F(theta) is small over [theta-s/2, theta+s/2].

msgLen ~
```                                1
- ln h(theta) - ln f(x|theta) + -ln F(theta)
2
1        1
- -ln 12 + -
2        2
```

<< [10] >>

# Fisher (multiple parameters)

theta = <theta1, theta2, ..., thetan,>

```F(x,theta)ij

d2
= -----------------( -ln f(x|theta))
d thetai d thetaj

F(theta) = SUMx:X f(x|theta).F(x,theta)
```
F(x,theta) and F(theta) are n×n matrices.
The Fisher information is the determinant of F(theta).

<< [11] >>
msgLen ~
```                               1
-ln h(theta) - ln f(x|theta) + -ln F(theta)
2

+ -(1 + ln(kn)) nits
```
where `kn` are lattice constants (re partitioning parameter space), `k1=1/12` and `kn->1/(2 pi e) = 0.0585498`. as `n->infinity` (Farr 1999 p43).

<< [12]

# Summary

General form for
• message length and parameter accuracy
• based on Fisher information.
NB. An approximation; it may break down if
• h(theta) varies rapidly in region of estimate
• e.g. for small amounts of data

Strict MML (SMML) makes no simplifying assumptions, but may be mathematically and algorithmically difficult.

Some sources:
• C. S. Wallace & P. R. Freeman. Estimation and Inference by Compact Coding. J. Royal Stat. Soc. 49(3) pp240-265, 1987
• G. Farr & C. S. Wallace. The Complexity of Strict Minimum Message Length Inference, Computer Journal 45(3) pp285-292, 2002, & TR97/321 Department of Computer Science, Monash University, Aug 1997
• G. Farr. Information Theory and MML Inference. School of Computer Science and Software Engineering, 1999
• Also see LA's bibliography.

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.