^up^ [01] >>
 Graph>

# (Decision) Classification Trees II

Have covered Trees over binary, e.g. T/F, Head/Tail, 0/1, M/F, ..., here cover

1. multi-state and

2. continuous attributes, and

3. some other generalizations.

This document is online at   http://www.csse.monash.edu.au/~lloyd/Archive/2005-08-Tree02/index.shtml   and contains hyper-links to other resources.

<< [02] >>

`F F L L F L L L`

<< [03] >>
```F F L L F L L L
```

The old stopping rule that   #L=#F+1   only works for binary trees.

But if the arity of each fork is known before its sub-trees are transmitted then subtrees can be matched to parents and the end of the message can be detected.

A full ternary tree with `n' forks has 2n+1 leaves
proof by induction (consider replacing a leaf with a fork + children).

A full tree of constant fan-out, with `n' forks has (fan_out-1)×n+1 leaves.

So P(F) and P(L) are no longer equal in general.   e.g. If all attributes are ternary, odds of F:L = 1:2,   i.e. P(F) = 1/3, P(L) = 2/3.

<< [04] >>
When arities of attributes differ, a practical solution is to:

Assume   P(F) = P(L) = 0.5   for root node.

Use the arity of a node to estimate P(F) & P(L) for its sub-trees. (Necessary to give code-words for F and L.)

P(F) = 1/arity,   P(L) = (arity-1)/arity

Use this rule recursively.
(NB. No problem with n-ary attributes in leaves. Use the [multi-state] distribution.)

<< [05] >>

# Continuous (Split) Attributes

e.g. height, weight, temperatures, ...

The Wallace - Patrick solution is to use the values in the attribute (common knowledge) as potential splitting values,

sort values,   and allocate code-words thus:

```              quartile    median      quartile      etc.
-----|-----|-----|-----|-----|-----|-----|-----

P:        1/32  1/8   1/32  1/2   1/32  1/8   1/32
code:    00100  011  00101   1   00110  010  00111

msgLen:    5     3     5     1     5     3     5    bits
```

An utterly insignificant amount of lost probability can be saved by renormalising so that the finite series sums to 1.0.

This method allows a split-value to be stated imprecisely and cheaply, e.g. just one bit for the median, or more precisely and expensively if justified.

<< [06] >>

# Search Problem

MML,   msgLen(T) + msgLen(D|T),   allows two trees to be compared fairly and the better one chosen.

The search problem is to find the best tree of all (hopefully), or at least to find a good tree.

Greedy search with limited lookahead of `L-levels' works well, where L is a small integer L>=0.

NB. Generalizations of the model space make the search problem harder.

<< [07] >>

# Generalized Leaves

Split-nodes of a tree partition data by hyper-planes perpendicular to the axes of the data measurements.

Each leaf models data in one element of the partition.

N-state data is modelled by n-state distribution.
Continuous data can be modelled by a continuous distribution.
Multi-variate data can be modelled by a multi-variate distribution.
- in principle.

<< [08] >>

# Generalized Splits

## n-state discrete

Fragmentation: High-arity discrete splits quickly divide training data into small sets, stopping elaboration of the tree.

In principle, a binary-split is possible on an n-state attribute by specifing a set of values to go left, complement goes right. Message length is n-bits per test.

## other

In principle, arbitrary functions can be used to send data to left or right (or more) sub-trees, but their parameters must be stated (cost).

NB. Generalizations make the search problem harder.

<< [09] >>

# Search

Typically:
• Greedy search, with

• limited lookahead of  L  levels, L>=0

<< [10]

# Conclusions

Treated
• n-ary (split) attributes
• modified coding of the tree
• continuous (split) attributes
• coding of the split value(s)
• precision
• possible generalizations of trees

### References

• C. S. Wallace & J. D. Patrick. Coding Decision Trees. Machine Learning 11 pp7-22, 1993.
- Introduced the tree code and message length calculations for decision trees, tests demonstrate good generalization and resistance to over-fitting. [Notes]

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (IRIX)",   charset=iso-8859-1