Mixture Modelling page
Welcome to
David Dowe's
clustering, mixture modelling and unsupervised learning page.
Postdoc available (Postdoctoral Fellowship job available, deadline: 31 July 2016) :
Research
Fellow in Statistics, Machine Learning, Mixture Modelling, Latent Factor Analysis and Astrophysics
(deadline 31/July/2016)
Mixture modelling (or mixture modeling, or finite mixture modelling,
or finite mixture modeling) concerns modelling a statistical
distribution by a mixture (or weighted sum) of other distributions.
Mixture modelling is also known as
unsupervised concept learning or unsupervised learning
(in Artificial Intelligence)
intrinsic classification (in Philosophy), or, classification
clustering
numerical taxonomy
In 1995, an International Workshop on Mixtures (also here) was held.
Also, an e-mailing list exists for "Classification, clustering, and phylogeny estimation", namely (CLASS-L@CCVM.SUNYSB.EDU or) owner-class-l@CCVM.SUNYSB.EDU, as does
a WWW site for the International Federation of Classification Societies (IFCS),
a WWW site for the Classification Society of North America (CSNA),
a WWW site for the Societe Francophone de Classification (SFC),
a WWW site for the (Polish) Sekcja Klasyfikacji i Analizy Danych PTS (SKAD) and
a WWW site for the (Dutch) Vereniging voor Ordinatie en Classificatie (VOC).
In 2001, there was:
Mixtures 2001,
Recent Developments in Mixture Modelling, 23 - 28 July 2001,
Universität der Bundeswehr,
Hamburg,
Germany.
Deadline December 31, 2000.
[See also Ray Solomonoff (1926-2009)
85th memorial
conference (Wedn 30 Nov - Fri 2 Dec 2011),
1st
Call for Papers.]
Most mixture modelling is done for mixtures of
Normal (or Gaussian) distributions.
However, other distributions for which mixture modelling
has been done include (e.g.) :
the multinomial (Bernoulli or multi-category) distribution,
the Poisson distribution and
the von Mises circular distribution.
Bibliographies
Chris Fraley's Classification Bibliography.
Peter Macdonald's mixture distribution bibliography.
Fionn Murtagh (and CSNA)'s
Classification Bibliographies.
Warren S. Sarle's selected
Bibliography on Cluster Analysis.
Luis Talavera's
Bibliography
of Conceptual Clustering.
John Uebersax's
Latent Class Analysis
bibliography.
(Also here.)
Below we give lists of some available mixture modellers of various
distributions:
On-Line Software for Clustering and Multivariate Analysis listed by the CSNA.
Fionn Murtagh's list of Multivariate Data Analysis Software and
Fionn Murtagh's pointers to, and addresses of, lots of multivariate data analysis code.
S*i*ftware's links to clustering software.
Mixture modellers of Binomial distributions
See "Mixture modellers of Multinomial (or Bernoulli or multi-category) distributions" below.
Mixture modellers of Gamma distributions
"MIX". Commercial (see below).
Yudi Agusta and
David Dowe, using MML.
See, e.g., publications
(2003,
.pdf).
Mixture modellers of Gaussian distributions
(Finite Gaussian mixture models)
AutoClass
(and Peter Cheeseman). Method: Bayesian.
Clustan:
www.clustan.com.
COBWEB, by Doug H. Fisher.
ECOBWEB
concept formation program.
John Wolfe's
Normix
(was here).
"MIX" Software Home Page (and About MIX) and mixture distribution bibliography. Commercial.
Snob
software &
ReadMe &
documentation files),
and
latest paper
[pp73-83 (Jan. 2000)];
by
C. Wallace
and
D. Dowe - finite mixture model(s) by
MML.
Snob Method: Bayesian,
Minimum Message Length
(MML)
finite mixture model,
information theory and
Kolmogorov
complexity
- see
"Minimum
Message Length and Kolmogorov complexity",
Comp.
J., 42:4.
Snob Features: Deals with a variety of distributions and missing data.
A
S. Akaho's
EM algorithm
(was here)
with link to paper.
Features: scale and shift parameters,
JAVA demo'.
S. Akaho's program also does "line mixing".
Mike Alder
(from CIIPS,
U.W.A.)'s
book
(including some examples of the EM algorithm used for
Gaussian mixture modelling).
C. Ambroise et al.'s
Constrained clustering and the EM algorithm
software for spatial clustering
(was Constrained clustering and the EM algorithm).
S. Aylward's Mixture Modeling for Medical Image Segmentation. Method: Permits mixture models comprising infinitely many Gaussian components with continuous collective parameterizations.
B
Kaye Basford
(co-author of mixture modelling book with (below) Geoff McLachlan)'s home page and
The Biometrics Unit (University of Queensland)'s publications.
R. A. Baxter and J. J. Oliver,
Finding
overlapping components with MML - see also earlier related work on
doing finite mixture models using MML by
Wallace and Dowe (1994)
and Wallace and Dowe (1997) and
contemporaneous work by Wallace and Dowe (2000).
Hamparsum Bozdogan's
home page
(was Hamparsum Bozdogan).
C
Dr Carroll's
Quasilikelihood estimation in measurement error models with correlated replicates paper.
Dr Carroll's
Method: quasilikelihood estimations.
Complex Systems Computation group (CoSCo), U. of Helsinki. Home page and research projects.
D
D. Dacunha-Castelle and E. Gassiat's work, papers nos. 25 and 44. Method: Maximum Likelihood.
Petros Dellaportas (and Dimitris Karlis)'s (mixture modelling) papers. Dellaportas-Karlis mixture modelling Method: Hierarchical, empirical Bayes, method of moments and simulation techniques.
David Dowe
(and
publications):
See Snob by
C. Wallace
and D. Dowe.
Has published on mixture modelling of
Gaussians with factor analysis
(with R. Edwards, 1998),
and (with Y. Agusta)
other correlated
Gaussians
(2003,
.pdf),
t distributions
(2002)
and
Gamma distributions
(2003).
E
Russell Edwards and David Dowe have extended
Snob to deal with single Gaussian factor
analysis (assuming total assignment)
using MML.
G
Peter Green.
H
Cem Hocaoglu.
HTK Book (and links to chapters). Commercial. Entropic Cambridge Research Laboratory Ltd.
Michael Jordan's projects.
J
Murray
Jorgensen's home page
(link to
MULTIMIX).
M
Geoff McLachlan is the author of several articles and a joint book on mixture modelling (with (above) Kaye Basford) and is currently completing
EMMIX
(MIXFIT)
software,
suitable for Max L'hood fitting of Gaussians in discriminant and cluster analyses and many experimental situations. Permits re-sampling-based tests and bootstrap-based standard error assessment.
Some of
G. McLachlan and David Peel's
data sets.
Boris Mirkin's
publications
and
current projects.
N
Radford Neal's Bayesian Mixture Modeling by Monte Carlo Simulation and Markov Chain Sampling Methods for Dirichlet Process Mixture Models.
R. Neal's Method: Exhibits the true Bayesian predictive distribution, not needing to decide on a "correct" no. of components.
R
Adrian Raftery's and
Chris Fraley's Model-Based Clustering Software (MCLUST).
Christian Robert's
ftp site.
S
Arthur C. Sanderson.
T
Rob Tibshirani's research, and
T. Hastie & R. Tibshirani Gaussian mixture paper.
T. Hastie
& R. Tibshirani's
Method(s): Linear discriminant analysis, Maximum Likelihood, non-parametric.
V
Gerhard Visser's and David Dowe's
(2007)
"Minimum Message
Length Clustering Of Spatially-Correlated Data with Varying Inter-Class
Penalties"
(and
here),
and
(with J. P. Uotila,
2009)
"Enhancing
MML Clustering using Context Data with Climate Applications"
(and here).
W
Chris Wallace
and David Dowe's
Snob work (and
software and
ReadMe),
and
latest paper
- see
Snob
above.
Uses Minimum Message Length
(MML).
[See also
Wallace (1998)
and
Visser & Dowe (2007)
on spatial correlation.]
Mike West's publications.
Mixture modellers of logistic distributions
Dr Carroll's
A nonparametric mixture approach to case-control studies with errors in covariables paper.
Dr Carroll's
Method: nonparametric.
Dr Carroll's
Segmented regression with errors in predictors paper.
Dr Carroll's
Method: semiparametric & parametric. Linear and logistic distributions.
Mixture modellers of log-Normal distributions
See "Mixture modellers of Gaussian distributions" above.
Mixture modellers of Multinomial (or Bernoulli or multi-category) distributions
Snob, by
Chris Wallace
and David Dowe
- see
Snob
above,
under "Gaussian".
Uses Minimum Message Length
(MML).
Murray
Jorgensen's home page (see above, or link to
MULTIMIX).
Martin Puterman's home page, with several of his papers, data and codes.
M. Puterman has worked on mixture models for discrete data. Method: Maximum Likelihood and penalised likelihood.
John Uebersax's
Latent Class Analysis page
has
FAQs,
bibliographies,
software links,
examples, and some of his
papers and programs
(including
MIXBIN,
which estimates a mixture of binomials).
Mixture modellers of Normal distributions
See "Mixture modellers of Gaussian distributions" above.
Mixture modellers of Poisson distributions
Snob, by
Chris Wallace
and David Dowe
- see
Snob
above,
under "Gaussian".
Uses Minimum Message Length
(MML).
Petros Dellaportas's
home page.
Mixture modellers of t distributions
Yudi Agusta and
David Dowe, using MML.
See, e.g., publications.
Mixture modellers of von Mises circular distributions
Snob, by
Chris Wallace
and David Dowe
- see
Snob
above,
under "Gaussian".
Uses Minimum Message Length
(MML).
Mixture modellers of von Mises Fisher spherical distributions
Oliver, J.J. and D.L. Dowe,
1996.
Uses Minimum Message Length
(MML).
Mixture modellers of Weibull distributions
"MIX". Commercial (see above).
Mixture modellers of Other distributions and Miscellaneous
Shotaro Akaho's EM algorithm (with link to paper) for "line mixing" (see above).
M. Black and A. Jepson's Mixture Models for Optical Flow Computation. Explores use of mixture models to represent optical flow in image regions containing multiple motions due to occlusion and transparency.
Vincent Garcia
and
Frank Nielsen's
jMEF
(``A Java library to create, process and manage mixtures of exponential families'').
Sara van de Geer's Home page. Method:General mixing models, maximum likelihood, asymptotic normality of linear functionals of the mixing distribution.
IBM's
CViz.
D. Laidlaw, K. Fleischer + A. Barr, Class'n of MRI Data for Geometric Modeling and Visualization.
Laidlaw, Fleischer and Barr's Method: Bayesian Mixture Classification.
Christian Lenart's (fuzzy) clustering page and description of software.
MEME software for finding patterns in DNA and protein sequences.
MIT (Germany)'s DataEngine Product Family page. Method: Fuzzy clustering. Commercial.
Vincent Garcia
and
Frank Nielsen's
jMEF
(``A Java library to create, process and manage mixtures of exponential families'').
NSWC Advanced Computational Technology Group's pattern recognition and classification, including work on mixtures based density estimation applied to statistical pattern recognition and image processing, e.g. J. Solka and W. Poston's Visualization of Finite and Adaptive Mixtures Models - Univariate Examples.
Adrian Raftery's clustering and spatial point pattern research and group on clustering and Bayesian model selection.
SPIDER is a large image processing system for electron microscopy, including multivariate statistical classification and cluster analysis. Commercial.
SUBDUE, by
Diane J. Cook and
Lawrence B. Holder.
Method: Hierarchical clustering using a
MDL
(see also
MDL)
heuristic to iteratively identify subgraphs within a graph that minimally compress the input graph.
M. Afzal Upal's publications on comparison(s) of non-hierarchical unsupervised classification algorithms.
Data links
Some data links
(and some medical data links);
and
Geoff McLachlan
and
David Peel's
"Finite Mixture Models"
and
data sets.
Of possible interest
Statistical Society of Canada
Case
Studies in Data Analysis for 2000 and
Mixtures Plus - Case Studies.
StatLib Index (from the Carnegie Mellon University Statistics Department).
Tjen-Sien Lim's
"Tree-Structured & Rules
Induction Programs Homepage"
Kevin Murphy's
list of
free Bayes net software.
"Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric
Languages",
by J. W. Comley and D.L. Dowe;
Chapter 11
(pp265-294)
in P. Grunwald, M. A. Pitt and I. J. Myung (eds.),
Advances in Minimum Description Length:
Theory and Applications,
M.I.T. Press, April 2005, ISBN 0-262-07262-9.
{This is about Generalised Bayesian nets, generalising MML Bayesian nets or
MML Bayesian networks or
MML Bayes nets
(or Generalised directed graphical
models, generalising MML directed graphical models); and it deals with
a mix of both continuous and discrete variables.
(See also
Comley and Dowe
(2003),
.pdf.)}
Data Mining Information, maintained by Graham Williams.
Online Machine Learning Resources, maintained by the ML Group at the Austrian Research Institute for Artificial Intelligence (OFAI), Vienna, Austria.
Artificial Intelligence Resources, maintained by NRC-CNRC Institute for Information Technology.
A Guide to the Web for Statisticians
(was A Guide to the Web for Statisticians),
maintained by Gordon Smyth.
Autonomous Agents '97 Related Sites.
AI Intelligence (and here)'s AI Information Bank. Commercial.
International Rough Set Soc'y, U. Regina's Electronic Bulletin of the Rough Set Community pages.
Bayesian Knowledge Discoverer (BKD), by Marco Ramoni and Paola Sebastiani: A program for model
selection with missing data using directed graphical models and discrete
variables.
http://www.gamma.rug.nl
iec ProGAMMA.
http://www.eco.rug.nl/medewerk/WEDEL/slides/segmenta/sld001.htm
slides on
Market segmentation with mixture models.
NASA Data Archive and Distribution Service.
Michael Carley's (acoustics and acoustic mixing) home page.
Minimum message length
(MML),
Chris Wallace (1933-2004)
(developer of MML in
1968),
Bayesian Nets using
Minimum message length
(MML),
data repositories,
decision trees and
decision graphs
using MML,
Occam's razor
(Ockham's razor),
Snob
(program for MML
clustering and mixture modelling, MML finite mixture models),
(econometric)
time series
using MML,
medical research,
a probabilistic sports prediction
competition
(and further reading on probabilistic
scoring),
chess and game theory research;
Feeding the world
(TheHungerSite),
TheRainforestSite,
"do-goody"/"do-goody stuff, improving the world and saving the planet".
This (mixture modelling, clustering, unsupervised concept learning,
intrinsic classification and numerical taxonomy) page
http://www.csse.monash.edu.au/~dld/mixture.modelling.page.html
was put together by
Dr David Dowe,
Dept. of Computer Science, Monash University, Clayton, Vic. 3168, Australia
e-mail:
d l d XX cs.monash.edu.au
(Fax: +61 3 9905-5146)
(and was started on Sun 26th Jan. 1997) and was last updated no earlier than
Fri 5th Feb. 1999.
Copyright
David L. Dowe,
Monash University, Australia,
26 Jan 1997, 3 Mar 1998, 7 May 1998, etc.
Copying is not permitted without expressed permission from
David L. Dowe.