^CSE454^ [01] >>

Data and Models

This section examines data values and models to learn lessons for some generalised software being developed in the CSSE.

CSE454 2005 : This document is online at http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/ and contains hyper-links to other resources - Lloyd Allison ©.

^CSE454^ << [02] >> Some Types of Values:

Type--|--Scalar--|--Discrete--|--Ints & subranges
      |          |            |
      |          |            |--Symbolic
      |          |
      |          |--Continuous & subranges
      |
      |--Structured  i.e. multivariate
      |
      |--Vector      N.B. homogenous
      |
      |--Union       i.e. either S1 or S2
      |
      |--Function    i.e. S1->S2
      |
      |--Model...

^CSE454^ << [03] >> Some distributions / models:

Model--|--Discrete----|--Uniform
       |              |
       |              |--Multistate etc.
       |
       |--Continuous--|--Uniform
       |              |
       |              |--Normal(m,s) etc.
       |
       |--Structured--|--Independent
       |              |
       |              |--Factors  etc.
       |
       |--Vector------|--set (independent)
                      |
                      |--series--|--Markov
                                 |
                                 etc.

A Model should be able to give (-log) probability of data value, generate (sample) data, ...

^CSE454^ << [04] >>

parameters
|
|
|
v

(input space
exogenous variables)

----->

"(Function -) Model"

----->

(output) Sample (Data) Space
endogenous variables

e.g. A classification- (decision-) tree T models life expectancy as N(m,s) given diet, gender and weight, where m and s depend on diet, gender and weight.

^CSE454^ << [05] >>

Mixture

Can form a mixture (weighted average) of models M₁, ..., M_n, given weights w₁, ..., w_n, where w₁ + ... + w_n = 1, provided that the types of the models are the same.

I.e. Input spaces, parameter spaces, and data spaces are the same across the M_i.

^CSE454^ << [06] >>

(Time-) Series

A model M with data space S trivially induces a model on S^* if the elements of the series are modelled as being independent.

There are more interesting models in S^*: A 1st-order Markov model can be thought of as |S| 0-order MM's, one for each "context".

(A 0-order Markov model is ~ a multi-state distribution.)

A time-series model can produce a model of the next value given (conditional on) the context of previous values.

^CSE454^ << [07] >>

Complex Models

People use the word "model" to cover anything from a simple probability distribution to "a model of the Australian economy" (MAE). At its most general the word is too general to program with although any instance, such as MAE, can be programmed from a collection of functions, data structures and simpler models.

Complex, commonly used models,: e.g. (hidden) Markov models (HMM), probabilistic finite state automata (PFSA), mixture models, classification- (decision-) trees & graphs, phylogenetic (evolutionary) trees, Bayesian networks, causal networks, artificial neural networks (ANNs),
can be built from a "library" of building blocks:: e.g. conditional probability tables (CPTs), multi-state distributions, normal distribution,
possibly with some "discrete structure" - sequence, tree, graph (network).

See L.Allison, Models for machine learning and data mining in functional programming, J. Functional Programming (JFP), 15(1), pp.15-32, January 2005,
and also [II] inc. TR 2004/153, TR 2003/148, ACSC2003.

Created with "vi (IRIX)", charset=iso-8859-1