- duration 1.5 hours + 10 minutes reading time.
- Attempt all questions.
- This paper counts for 60% of the subject's assessment.
- Closed book.
- No calculators.
- The attached [tables] give sufficient accuracy for any estimates of probabilities and message lengths.

**1.**
(a) Define the following terms as they apply to learning and prediction:

- Prior probability.
- Likelihood.
- Posterior probability.
- Bayes' theorem.

[10 marks]

(b) Three fair coins, Pr(H)=0.5, and one biased coin, Pr(H)=0.75,
are placed in a cloth bag;
all of the coins are identical in appearance and in weight.
The bag is shaken and one coin is drawn out.
The selected coin is thrown four times with
these

- What is the prior probability of a fair coin versus a biased coin?
- What is the likelihood of HHTH for a fair coin, and for the biased coin?
- What is the posterior probability of a fair coin versus a biased coin?

[10 marks]

**2.**
(a) For multi-state probability distributions,
define the `Kullback Leibler' (KL) distance
from the "true" probability distribution P_{1}
to another probability distribution P_{2}.

[8 marks]

(b) Below are two probability distributions, P and Q, over {A,C,G,T}. Calculate the KL distance from P to Q, and also from Q to P.

[12 marks]

- | A | C | G | T |

P: | 1/2 | 1/4 | 1/8 | 1/8 |

Q: | 1/8 | 1/2 | 1/4 | 1/8 |

**3.**
A library has a collection of 50 essays,
all believed to have been written by author X.
Professor Smith disagrees and believes that
some of the essays were written by another person, author Y.
Unfortunately
no work exists that is absolutely certain to have been written by X, and
no work exists that is absolutely certain to have been written by Y.
It is assumed that each essay had a single author.
It is a fact that different people tend to use words,
including common words, with different frequencies.
Smith gathers data from the essays,
specifically the length of each essay and the number of times that
each of the following common words appears in each essay:

(a) How would you apply Snob to this problem and this data? Why?

[10 marks]

(b) What signs would you look for in Snob's results to support the single-author hypothesis, or to support the two-author hypothesis, or other possible explanations of the data? Why?

[10 marks]

**4.**
(a) Consider decision- (classification-) trees over multivariate data
with discrete and/or continuous attributes to be tested
and a discrete (`class') attribute to be predicted.

Describe an efficient scheme to encode

[10 marks]

- (b) A multivariate data set has the following attributes:
- att 1: T, F i.e. a boolean attribute
- att 2: T, F
- att 3: T, F
- att 4: T, F
- att 5: a, b, c, d i.e. 4 possible values
- att 6: +, - i.e. 2 possible values, positive and negative cases
- att 6 is the `class', i.e. the attribute to be predicted.

If tree 1 and tree 2, below, are two possible models of the data, which tree is the better explanation of the data? Why?

[10 marks]

tree 1 | tree 2 |
---|---|

(The leaf nodes show the frequencies of positive and negative cases.) |

(c) More data of the same kind is collected and the following tree structure, including splits, is learned:

[10 marks]

(d) The tree models the new data as follows:

[10 marks]

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800. Created with "vi (Linux & IRIX)", charset=iso-8859-1