[01] >>

## Types and Classes of
| [paper (click)], also see [II (click)] |

**Abstract:**
The notion of a statistical model, as inferred and used in
statistics, machine learning and data mining,
is examined from a semantic point of view.
Data types and type-classes for models are developed
that allow models to be manipulated in a type-safe yet flexible way.
The programming language Haskell-98, with its system
of polymorphic types and type-classes,
is used as the meta-language for this exercise
so one of the by-products is a running program.

<< [02] >>

"... considered as a biological phenomenon, aesthetic preferences stem from a predisposition among animals and men to seek out experiences through which they maylearn to classifythe objects in the world about them. Beautiful `structures' in nature or in art are those which facilitate the task of classification by presenting evidence of the `taxonomic' relations between things in a way which is informative and easy to grasp."

-- N. K. Humphrey.The illusion of beauty. Perception2, pp. 429-439, 1972.

<< [03] >>

- H' argues a sense of beauty is a by-product(?)
of (useful) ability to
*classify*. - Classification is about similarity and difference.
- 1. Unsupervised & supervised classification are important problems in M.L. and D.M..
- 2. Notice
*similarity*of many products and of many activities in M.L. and D.M. research themselves. - Here, want to make precise these similarities and differences. (Efficiency can be addressed, but is a secondary consideration today.)

<< [04] >>

Class | As in OOP |

Class | A number of individuals [...] possessing common attributes... |

Class | A division or order of society... |

Class | Natural History. One of the highest groups... |

Model Class | As in Statistics |

Model [citizen] | An exemplar |

Model | A person [...] who is employed to display clothes... |

Model | A summary, epitome, or abstract... |

Model | A description of structure... |

~ Class as in OOP! |

<< [05] >>

- Shall use Haskell 98
*lazy*functional programming (FP) language,*polymorphic*types,e.g. map :: (t->u) -> [t] -> [u], (t, u type params, [...] list, -> function) - type
*classes*, - type inference algorithm,
((abused) types given here, but really inferred automatically.) - to describe ``statistical models'' for want of a term.

<< [06] >>

Most important property of a (class of) statistical model

**class**Model mdl**where**

- pr
:: (mdl dataSpace) -> dataSpace -> Probability

- msg2
:: (mdl dataSpace) -> dataSpace
-> MessageLength `--`(2nd part)

- msg :: . . .
(mdl dataSpace)
-> dataSpace -> MessageLength

- -- a minimum; maybe
^{*}a Model can also do other things.

<< [07] >>

normal m s | :: | Model of Float |

freqs2model | :: | [Int] -> Model of [0..n-1] |

bivariate | :: | (Model of d1) -> (Model of d2) -> Model of (d1, d2) |

etc. |

<< [08] >>

FunctionModels

**class**FunctionModel fm**where**- condModel :: (fm inSpace opSpace) -> inSpace -> ModelType opSpace
- condPr :: (fm inSpace opSpace) -> inSpace -> opSpace -> Probability
- condMsg2 :: (fm inSpace opSpace) -> inSpace -> opSpace -> MessageLength
- e.g. linear a b eps :: FunctionModel of Float Float
- i.e. y ~ a × x + b + (normal 0 eps)
- . . .

<< [09] >>

. . . and TimeSeries

**class**TimeSeries tsm**where**- predictors :: (tsm dataSpace) -> [dataSpace] -> [ModelType dataSpace]
- prs :: (tsm dataSpace) -> [dataSpace] -> [Probability]
- msg2s :: (tsm dataSpace) -> [dataSpace] -> [MessageLength]
- e.g. markov n :: TimeSeries of someDiscreteType

(Slight abuse of Haskell type notation.)

<< [10] >>

Our classes have some common properties; we need a super-class. Obviously...

**class**SuperModel sMdl**where**- prior :: sMdl -> Probability
- msg1 :: sMdl -> MessageLength
- mixture
:: (Mixture mx, SuperModel (mx sMdl)) =>
mx sMdl -> sMdl **class**Mixture mx**where**- mixer :: (SuperModel t) => mx t -> ModelType Int
- components :: (SuperModel t) => mx t -> [t]
**instance**SuperModel (ModelType dataSpace)**where**- msg1 (MPr mdlLen p) = mdlLen
- . . . etc.

<< [11] >>

<< [12] >>

estMixtureests dataSet =let... ... (22 lines of code) ...inmixture( ... .)

**estMixture**:: [ [dataSpace] -> [Float] -> Model of dataSpace ] -- estimators - -> [ dataSpace ]
-- training data - -> (Mixture) Model of dataSpace

<< [13] >>

estCTreeestLeafMdl splits ipSet opSet =let... ... (32 lines of code) ...in...

**estCTree**:: ( [opSpace] -> Model of opSpace ) -- leaf model est' - -> ( ipSpace -> [ ipSpace -> Int ] )
-- partitioning - -> [ipSpace] -> [opSpace]
-- training data - -> CTree ipSpace opSpace
-- an instance of FunctionModel ipSpace opSpace - -- roughly (and it works)

<< [14] >>

E.g. CTree is more than a (C5) *classification*-tree....

- estFunctionModel2estModel
estFn
ipOpPairs
**=** - functionModel2model (uncurry estFn (unzip ipOpPairs))
- ft
**=**estCTree (estFunctionModel2estModel estFiniteFunction) `--`e.g. - splits
- trainingIp trainingOp

- -- in effect a FunctionModel-tree, i.e. a regression-tree, automatically, for little effort.
- Turn
an estimator for a FunctionModel into
an estimator for a Model for use with estCTree.
NB. Can use estimators other than estFiniteFunction!! - (E.g. Similarly, FunctionModel-mixtures, etc..)

<< [15]

A good summer collection

**Models**, e.g.probability distributions , mixtures (unsupervised classification ).

**FunctionModels**, e.g. curve fitting, regressions,classification trees (supervised classification ),regression trees .

**TimeSeries**, e.g. Markov models.

- Operators and conversion functions on the above.

- General,
e.g. estimate a mixture of FunctionModels ,estimate a FunctionModel- (regression-) -tree, etc..

- Have a model of modelling:
A theory, usable in its own right(it runs), a rapid-prototype for adata mining platform .

- [paper (click)]

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3168. Created with "vi (Linux & Solaris)", charset=iso-8859-1