^inductive programming^

## Classification- (Decision-) Trees

Problem: Given a data-set from a data-space, 'Ip × Op', with input-space, 'Ip', and output-space, 'Op', learn a classification-tree function-model to model Op conditional upon Ip.

E.g. from Ip = <Appetite, Fever, Stomach_pain> conditionally model Op = Has_appendicitis.

users.monash.edu/~lloyd/Seminars/2005-II/Tree/index.shtml

## Partitioning on input attributes (input space)

Discrete attribute:
k-way split where k is the `arity' of the attribute.

Ordered attribute:
Try median, left-quartile, right-quartile, octiles, ... .
NB. depends on current (part of) the data set.

Continuous attribute:
As for ordered attribute.

Multivariate:
Interleave the splitting options of all component attributes.

## Classification Trees:

```
data CTreeType inSpace opSpace =

CTleaf (ModelType opSpace)   |

CTfork MessageLength
(Splitter inSpace)
[CTreeType inSpace opSpace]

| ...

-- can also consider other non-traditional options
```

A ``Splitter`' is used to partition the input space.

```
instance SuperModel (CTreeType inSpace opSpace) where

-- NB. For simplicity only, this costs the
-- structure at 1-bit per node.  This is
-- only optimal for binary trees.

msg1 (CTleaf leafModel)
= log2 + msg1 leafModel

msg1 (CTfork fnLen f dts)
= log2 + fnLen + (foldl (+) 0 (map msg1 dts))

```

Message length of a tree is that of the node plus those of the sub-trees, if any.

## Make a Classification Tree an instance of a FunctionModel:

```
instance FunctionModel CTreeType where

condModel (CTleaf leafModel) i = leafModel

condModel (CTfork fnLen f dts) i
= condModel (dts !! (applySplitter f i)) i
```

## estCTree : Estimate a Classification Tree

It is convenient to have an inner, local ```search`'' function:

```
estCTree  estLeafMdl splits  ipSet opSet =
let

search ipSet opSet =
let
. . .
```

Simplest possible tree is a single `leaf` node:

```
leaf    = CTleaf leafMdl

leafMdl = estLeafMdl opSet

leafMsg = msg (functionModel2model leaf)
(zip ipSet opSet)
```

simple recursive search for the best (1- or) 2-level tree;

base case:

```alternatives  []
bestML bestCTree bestIpParts bestOpParts
= (bestCTree, bestIpParts, bestOpParts)  -- done
```

and...

...general case:

```alternatives (sp:sps)
bestML bestCTree bestIpParts bestOpParts =
let
-- NB. the `1' below is acceptable but not optimal
theTree  = CTfork (log (fromIntegral (length splts)))
sp leaves
leaves   = map CTleaf leafMdls     -- one leaf ...
leafMdls = map estLeafMdl opParts  -- ... per part
partNums = map (applySplitter sp) ipSet
ipParts  = partition (aritySplitter sp) partNums ipSet
opParts  = partition (aritySplitter sp) partNums opSet
msgLen   = msg (functionModel2model theTree)
(zip ipSet opSet)
in
if msgLen < bestML
then alternatives sps msgLen theTree ipParts opParts
else alternatives sps bestML bestCTree bestIpParts bestOpParts
```

## let the search begin:

```
splts = splits ipSet     -- partitions of input space

in
case  alternatives splts leafMsg leaf [ipSet] [opSet]  of

-- subtrees?  search continues...
((CTfork msgLen pf leaves), ipParts, opParts) ->
CTfork msgLen pf (zipWith search ipParts opParts);

-- the single leaf wins?  Search is over!
(t, _, _) -> t

in search ipSet opSet

```
-- That's all --

## Summary

Can estimate (fit, infer, learn) a classification-tree given (i) ways of partitioning the data on input attributes and (ii) an estimator for leaf models.

NB. Any kind of leaf model can be used -- discrete, continuous, or multivariate;   even...

...FunctionModel- (Regression-) Trees
Use a conversion function estFunctionModel2estModel:
```estFunctionModel2estModel estFn ipOpPairs =
functionModel2model (uncurry estFn (unzip ipOpPairs))

ft = estCTree (estFunctionModel2estModel estFnMdl)
splits trainingIp trainingOp
```

See: L. Allison. Models for Machine Learning and Data Mining in Functional Programming, J. Functional Programming (JFP), 15(1), pp.15-32, January 2005, and also [II(click)].