2State and Binomial Distributions 

`A' is an event for which P(A)=p and hence P(not A)=1p. Consider `n' independent trials (also known as Bernoulli trails). The random variable X is defined to be the number of times that A occurred. Then X has a binomial distribution with parameters n and p. The following is taken from Wallace & Boulton (1968) to follow the historical development of MML. The general MML form, explicitly based on the [Fisher] information, is given later. CodingTo transmit a sequence of `n' cointosses containing `h' heads, This is a multistate distribution with m=2. A uniform prior is assumed for P(H). The value of `n' is either common knowledge or is transmitted by standard methods. The receiver does not know h or t. h = #heads, t = #tails, n = h+t r(H) = h/n = frequency of heads, r(T) = ... P(H) = probability of a head, P(T) =... P'(H) = estimate of P(H), P'(T) =... We describe three methods to transmit the sequence of
cointosses, from Wallace & Boulton (1968): Method 1
requires log(n!/(h!(nh)!)) bits. Method 2
eg. H T T H T T T H T T H count: 1 2 2 2 3 3 3 3 4 4 T count: 1 1 2 3 3 4 5 6 6 7 sum : 2 3 4 5 6 7 8 9 10 11 count 1 1 2 2 3 4 5 3 6 7 p =  =           sum 2 3 4 5 6 7 8 9 10 11 p = h!.(nh)!/(n+1)! msg len = log( (n+1)!/(h!(nh)!) ) So method 1 and method 2 require the same message length. Method 3
Total msg len = msglen(P'(H))  h.log(P'(H))  t.log(P'(T))
The values of P(H) range over G=[0,1] uniformly and a particular estimate will be used over some range of r(H). This range will depend on n and on P'(H).
Note that for a given a(H), D is large for P'(H) close to 0 or to 1, and also that D is symmetric for a(H) +ve or ve, so P'(H) should be at the centre of the range over which it is used. Average value of a(H)^{2} over the range [P'(H)x/2, P'(H)+x/2] of length x is
avgD = n . x^{2} / (24 . P'(H).(1P'(H)))The message length to state the interval [P'(H)x/2, P'(H)+x/2] and thus P'(H) is log(x)  'cos of the uniform prior in P(H)Total avg cost to transmit interval and also D due to mismatch of P'(H) to r(H) is log(x) + n . x^{2} /(24 . P'(H).(1P'(H)))to minimise avg increase, differentiate w.r.t. x and find zero 1/x + n . x / (12 . P'(H).(1P'(H))) = 0 x^{2} = 12 . P'(H) . (1P'(H)) / nSo min' increase is
To compare Method 3 with methods 1 and 2: Stirling's approx: log(x!) = x.log(x)  x + 1/2 log(x) + 1/2log(2 pi) + ... method 1 or 2: log( (n+1)!/(h! t!) ) = log(n+1) + log(n!/(h! t!)) = log(n+1) + n log n  n + 1/2 log n + 1/2 log(2 pi)  h log h + h  1/2 log h  1/2 log(2 pi)  t log t + t  1/2 log t  1/2 log(2 pi) = log(n+1)  h log(h/n)  t log(t/n) + 1/2(log n log h log t log(2 pi)) = 1/2(log((n+1)^{2} . n / (h t))  log(2 pi) )  h.log(h/n)  t.log(t/n) = 1/2{log((n+1)^{2}/n)  log(h/n)  log(t/n) log(2 pi)}  h.log ...... = 1/2{log((n+1)^{2}/n)log(2 pi)log(r(H))log(r(T))} h.log(h/n)t.log(t/n) method 3  method 1 = 1/2{log(n/12) + 1  Σ_{[m:HT]}log(r(m))} h.log(h/n)t.log(t/n) {1/2{log((n+1)^{2}/n)log(2 pi)log(r(H))log(r(T))} h.log(h/n)t.log(t/n)} = 1/2{2log(1+1/n) log(12)+1+log(2 pi)} ~ 1/2 { log(pi/6) + 1 }  for large n = 1/2 log(pi.e/6) = 0.176 nits So method 3 is just marginally(!) worse than
DemonstrationUse the HTML FORM below to generate a sequence of coin tosses for a specified pr(H) and length N. The `code' button calculates code length for a fixed code (nominate an estimate p' for P(H)), combinatorial code, adaptive code, and a code which states pr(H) to an optimal level of accuracy. NB. the appoximations used may break down for very small values of N.
Don't worry if JavaScript gives a ridiculous number of decimal places, it's just its way. NotesThis page is directly based on the appendix in the paper Wallace & Boulton (1968); any errors by L.A..


