200312, ver.1.1, to do
- Probably a better standard to estimate a time-series model from
a set of data series, [[dataSpace]], rather than from just one, [dataSpace].
- It might be better to deal with "weighted data" via a
data Weighted d = Wt Double d
(although it is estimators that work on weighted data -
does one really have a Model of weighted data?),
as inspired by the success of treating missing data,
Maybe d, and associated models and operators
in the [Bayes-nets] case study.
(It might be worth treating continuous data (measurement accuracy)
in a similar way, or not?-)
weighted data, and
measurement accuracy of continuous data.
Weighted data are primarily needed for
-- fractional weights for class memberships
(and can also represent repeated values).
But being missing is like having a weight of zero,
and also like having vanishingly low measurement accuracy.
Should measurement accuracy be an explicit
component of every continuous datum?
- If so, sufficient statistics be manipulated
as a single tuple by uncurried functions?
Then such functions could have the same (polymorphic) type,
and composition of the counter-function and the model-builder
would be slightly easier.
Under what conditions is there an operator of type roughly
estimator ds -> estWeighted ds.
The ss must be additive, scalable?
- Generalize the current 0-lookahead search for classification-trees to n-lookahead, n>0.
- Add splitting and merging of components to the mixture-model search.
- Provide some simple I/O support, e.g. for `comma separated variable' files.
- Always: Look for places to make better use of the prelude functions, e.g. any, all, fold[l|r], min, max, repeat, scan[l|r], sum, zipWith, etc., to simplify the code.
School of Computer Science and Software Engineering,
Monash University, Australia 3800.
Created with "vi (Linux & Solaris)", charset=iso-8859-1