Note: The tables are always given out, for a reason  become familiar with them. Show working, and perhaps give a little explanation, so that credit can be given even if a small slip makes an estimate wrong. But also do a quick "sanity check" to make sure that a numerical answer is plausible. 

Notes on answers. Not an answer, a note:
Entropy can be interpreted as the expected (average)
information content per item of a sample drawn from
a probability distribution.
In later questions, 

Notes on answers. The alphabet is of size three (remember this for Q3). (c) Adaptive method, 10 data, 5×A, 3×B, 2×C: A B A C B A C B A A #A: 1 2 2 3 3 3 4 4 4 5 6  green #B: 1 1 2 2 2 3 3 3 4 4 4  black #C: 1 1 1 1 2 2 2 3 3 3 3  blue Tot: 3 4 5 6 7 8 9 10 11 12 13 1 1 2 1 2 3 2 3 4 5 5! 3! 2!           =  3 4 5 6 7 8 9 10 11 12 12! / 2 ln(12!)  ln(5!)  ln(3!)  ln(2!)  ln(2) = 20.0  4.8  1.8  0.7  0.7 = 20.0  8.0 = 12.0 nits (Sanity check to get an order of magnitude:
If pr(A)=pr(B)=pr(C)=1/3
(and we don't actually know that!),

 
Notes on answers. In general, Bayes, dependent, P(G&D)=P(G).P(DG)=P(D).P(GD). If G and D are independent, P(GD)=P(G), P(DG)=P(D), so P(G&D)=P(G).P(D). (a) One 3state distribution for G and another 3state distribution for D. All that practice in Q2 should remind you how to deal with multi(3)state distributions. Just follow the same process, twice. We want to see working and numbers. (b) There are two possible ways to do part(b). (c) The difference in the numbers from part(a) and part(b).... (Sanity check: log_{2}(3)=1.58 (ln(3)=1.1). There are sixty values over {M,P,T} or {R,U,S} to transmit in total. You should expect message length total of the same order of magnitude as 90bits (65nits) or thereabouts, maybe less, maybe more. But anything much bigger or much smaller is probably wrong.) 
[(d) 8 marks]
© 6/2003, L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.  
Notes on answers. (a) Topology [how?], tests [how?].
Arity re tests on discrete attributes [how?].
Parameters of leaf distributions.
Data (predicted) attribute(s) according to selected leaf model. (b) Note that SEG's arity (three), doubtless as
described for discrete tests in the answer to part(a), comes into play
for the children of the testnode for @3.
See the [tables]
for log_{2} and/or (c) Like Q2, except it is a simpler 2state {PN} distribution here,
(d) This makes SEG "like" a continuous attribute in some ways.
If tested, it is also necessary to code (1bit for 3state SEG)
what the test condition is,
(e) Here is the "the subtree rooted at the old 3way split" There are two possible trees that "use[s] these new 2way splits" and are "otherwise equivalent to (have[s] the same leaves as)" it. Either one of the following would do:
(f) And here is the subtree if "those two leaves [when SEG=M, SEG=H] of the new subtree are collapsed into one." Under topology, we save one newstyle split node on @3 and one leaf.
The data part must change slightly corresponding to the move from
(Sanity check:
100 data in the ratio 40:60 must cost very roughly 100bits, 69nits,
