MML, MDL, Minimum Encoding Length Inference

Postdoc available (Postdoctoral Fellowship job available, deadline: 31 July 2016) : Research Fellow in Statistics, Machine Learning, Mixture Modelling, Latent Factor Analysis and Astrophysics (deadline 31/July/2016)

Welcome to David Dowe's minimum encoding length inference page. This page discusses primarily the information-theoretic inference methods of: Minimum Description Length (MDL) (Rissanen, 1978; Rissanen, 1989; Rissanen, 1999) and the earlier Minimum Message Length (MML) (Wallace and Boulton, Comp. J. 1968; Wallace and Freeman, 1987; Wallace and Dowe, 1999a; Wallace, 2005). Related earlier works were Solomonoff, 1964; Kolmogorov, 1965; and Chaitin, 1966.

Minimum Message Length (MML) is an information-theoretic Bayesian principle of inductive inference. It was first published in C. S. Wallace and D.M. Boulton, Comp. J., 1968.
The first MML papers were C. S. Wallace and D.M. Boulton, Comp. J., 1968; D.M. Boulton and C. S. Wallace, 1969; D.M. Boulton and C. S. Wallace, 1970; D.M. Boulton and C. S. Wallace, 1973; D.M. Boulton and C. S. Wallace, 1975; C. S. Wallace and D.M. Boulton, 1975; and David M. Boulton's 1975 Ph.D. thesis.

[See also Ray Solomonoff (1926-2009) 85th memorial conference (Wedn 30 Nov - Fri 2 Dec 2011), 3rd Call for Papers, 2nd Call for Papers, 1st Call for Papers, Invited speakers, accepted papers.]

Pieces of literature I particularly recommend include:

Wallace, C.S. (2005) [posthumous], Statistical and Inductive Inference by Minimum Message Length, Springer (Series: Information Science and Statistics), 2005, XVI, 432 pp., 22 illus., Hardcover, ISBN: 0-387-23795-X [table of contents, chapter headings, preface (and page vi, also here) and more]; and

D. L. Dowe, "Foreword re C. S. Wallace", Computer Journal, Vol. 51, No. 5 (Sept. 2008) [Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560; and

C.S. Wallace and D. L. Dowe, "Minimum Message Length and Kolmogorov complexity", Comp. J., Vol. 42, No. 4 (1999a), pp270-283 [which is also Chris Wallace's most cited work which is co-authored by a still active MML researcher] and also other articles in that special issue of the Comp. J. on Kolmogorov complexity; and

Comley, Joshua W. and D.L. Dowe (2005). Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in P. Gru:nwald, I. J. Myung and M. A. Pitt (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press (MIT Press), April 2005, ISBN 0-262-07262-9 [Final camera ready copy was submitted in October 2003] [pp265-284 and pp285-294; p265, p266, p268, p268, p269, p270, p271, p272, p273, p274, p275, p276, p278, p278, p279, p280, p281, p282, p283, p284, p285, p286, p288, p288, p289, p290, p291, p292, p293, p294] (and other chapters in that book); and

Dowe, D.L., S. Gardner and G. Oppy (December 2007). "Bayes Not Bust! Why Simplicity is no problem for Bayesians", British Journal for the Philosophy of Science (BJPS): Abstract, full text, .pdf; doi:10.1093/bjps/axm033 . [accepted, 2006, earlier, near-final version of the paper. See also here.]; and

D. L. Dowe (2011 [was 2010]), "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness", Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, Elsevier [ISBN: 978-0-444-51862-0 {ISBN 10: 0-444-51542-9 / ISBN 13: 978-0-444-51862-0}], pp901-982.

Below are some frequently asked questions (FAQs):
What is the difference between MML and MDL?
What are the differences between MML and MDL?
What is the similarity between MML and MDL?
What are the similarities between MML and MDL?
Can you compare MML and MDL, and give a comparison between MML and MDL?
``Answers'': Have a look at \cite[sec. 10.2]{Wallace2005} and
\cite[sec. 11.4.3, pp272-273]{ComleyDowe2005} and
\cite[sec. 6.7]{Dowe2010} and the references within these, and
also have a look at Comp. J., Vol 42, No. 4, Wallace and Dowe (1999a), and anything else you can find.

There are also links below to related topics and to people doing related research. Please e-mail me if you'd like to be included.

MDL, MML and algorithmic information theory links
Computer Journal special issue on Kolmogorov complexity and algorithmic information theory, Volume 42, Issue 4: 1999. This issue features contributions by many authors - including articles by Rissanen, by Solomonoff, by Wallace and Dowe, and by others; and responses and rejoinders on MDL and MML by Rissanen and by Wallace and Dowe respectively.

Calendar of Machine Learning, RUUG, CSSE, Monash Univ. MML talks and
CSE455 Minimum Message Length (formerly Learning and Prediction II: MML Data Mining [formerly before that CSC423 Learning and Prediction course]), CSSE, Monash Univ.
Information, Statistics and Induction in Science (ISIS) conference, Aug. 1996.
"Minimum Message Length and Kolmogorov complexity", Comp. J., Vol 42, No. 4 (1999a), pp270-283, by C. Wallace and D. Dowe [which is the Computer Journal's most downloaded ``full text as .pdf'' article - see, e.g., here].
"Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", by J. W. Comley and D.L. Dowe; Chapter 11 (pp265-294) in P. Grunwald, M. A. Pitt and I. J. Myung (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9. {This is about Generalised Bayesian nets (or even the special case of hybrid Bayesian nets), generalising MML Bayesian nets or MML Bayesian networks or MML Bayes nets; and it deals with a mix of both continuous and discrete variables. (See also Comley and Dowe (2003), .pdf.)}

Related links and, below, People
Artificial Intelligence (AI) on the Web: Machine Learning and Artificial Intelligence Resources.
Bayesian Net Repository - see also MML Bayes Nets: Comley & Dowe (2005) and (2003, .pdf).
Bayesian Statisticians worldwide (was here and was previously here), a link re Rev. Thomas Bayes, a link to a small drawing of Rev. Thomas Bayes and (his posthumously published) "An Essay towards solving a Problem in the Doctrine of Chances". MML uses Bayesian information theory.
Bayesianiam, compression and intelligence [Dowe and Hajek (1997, 1998)].
Boosting Research Site: boosting.org.
Clustering, mixture modelling (or finite mixture modeling) and/or unsupervised learning using MML: Snob - see, e.g., Wallace and Dowe (2000) (was here).
Compression and intelligence [Dowe and Hajek (1997, 1998)], and other compression pointers.
Computational complexity links.
Computational Learning Theory (CoLT) page: theorists and resources, and Thomas Zeugmann's COLTBIB.
CSE455, 4th Year Hons course entitled Minimum Message Length, Monash University.
Data collections and Vlad's KD and data mining page.
Kernel machines WWW site: www.kernel-machines.org; an Alex Smola SVM talk; and Support Vector Machine mailing list.
Machine Learning and CBR people and Mach. Learning Resources (maintained by David Aha). Online Mach. Learn. Resources.
Nit - a nit is a unit of information (1 nit = log2(e) bits, 1 bit = ln(2) nits) going back at least as far as Boulton and Wallace (1970). See sec. 11.4.1, p271 of Comley and Dowe (2005) (MIT Press, pp265-294) for a very concise history.
Occam's razor links: Minimum Encoding Length Inference is an operational form of Ockham's razor.
Probabilistic prediction, probabilistic prediction competition and Gaussian prediction competition for Australian football, and history.
Statistics: University of Florida Department of Statistics's Statistics page; and AskDrMath's Prob/Stat and Statistics.
Support Vector Machine mailing list; and Kernel machines WWW site, www.kernel-machines.org. David Albrecht's SVM site, another, and another; and Tan and Dowe (2004 , .pdf) on MML SVMs.

People interested in MDL, MML, algorithmic probability and algorithmic information theory
[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
A
Yudi Agusta.
Akaike Information Criterion (AIC): To compare MML and AIC, see, e.g., here.
David Albrecht (and his SVM site).
Lloyd Allison's MDL (stochastic complexity) and MML pages.
Christoph Arndt's information theory notes and "Information measures" (in English) book.
Azat Arslanov
B
Andrew Barron
Rohan Baxter
Rev. Thomas Bayes (dec. 1761), a small drawing of Rev. Thomas Bayes, a German article on him; and Bayesian Statisticians worldwide. MML uses Bayesian information theory.
Bayesian Net s using Minimum message length (MML).
Adrian Bickerstaffe.
Olivier Bousquet's Kolmogorov complexity resources, Kolmogorov complexity mail list and list archive.
David M. Boulton
Matthew Brand
Peter G. Bryant
C
Cristian S. Calude
CDMS.
Greg Chaitin
Bertrand Clarke
John G. Cleary
Clustering, mixture modelling (or finite mixture modeling) and/or unsupervised learning using MML: Snob - see, e.g., Wallace and Dowe (2000).
Josh Comley and (with D. L. Dowe) "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", Chapter 11 (pp265-294) in P. Grunwald, M. A. Pitt and I. J. Myung (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9. {This is about Generalised Bayesian nets, generalising MML Bayesian nets or MML Bayesian networks or MML Bayes nets (or generalised directed graphical models, or generalised MML directed graphical models); and it deals with a mix of both continuous and discrete variables. (See also Comley and Dowe (2003), .pdf.)}
Consistency: The Wallace-Freeman (1987) version of MML has shown to be statistically consistent on the Neyman-Scott problem - see Dowe & Wallace (1997). We have conjectured (or asked the question) [Dowe, Baxter, Oliver & Wallace (1998, sec. 6, p93), Edwards & Dowe(1998, sec. 5.3, p104), Wallace & Dowe (1999a, p282), Wallace & Dowe (2000, sec. 5, p78), Comley & Dowe (2005, sec. 11.3.1, p269), and DoweGardnerOppy2007 (sec. 8 (Conclusion), currently(?) p52)] whether or not only MML and closely related Bayesian methods can, in general, infer fully-specified models with both statistical consistency and invariance.
Thomas M. Cover
D
Ian Davidson.
A. Phil Dawid
Ivanoe De Falco (and colleagues)'s papers and Genetic Programming Approach to Kolmogorov Complexity page.
Trevor Dix
Byron Dom's publications.
David Dowe and (with C. S. Wallace): "Minimum Message Length and Kolmogorov complexity" Comp. J., Vol 42, No. 4 (1999a), pp270-283 [this article is the Computer Journal's most downloaded ``full text as .pdf'' - see, e.g., here]; and (with J. W. Comley) "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", Chapter 11 (pp265-294) in P. Grunwald, M. A. Pitt and I. J. Myung (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9; and D. Dowe publications (or other D. Dowe publications).
Mike Dowman.
E
Bruce Edmonds's publications and Complexity Related Links.
Russell Edwards, Astronomical Institute "Anton Pannekoek", Faculty of Mathematics, Computer Science, Physics and Astronomy (WINS), University of Amsterdam.
Mark Ellison (was here).
Vladimir Estivill-Castro
F
Graham Farr's publications.
Ronald Aylmer Fisher (1890-1962): While R. A. Fisher is generally considered far from Bayesian, both Rissanen's MDL work (e.g., 1996) and Wallace's (Bayesian) MML work (e.g., with Freeman, 1987) use the notion of Fisher information. Image of R. A. Fisher (was here).
Leigh Fitzgibbon's publications.
Simon Fox
Roberto Fraile (and here).
Pasi Fra:nti.
Norio Fukuda
G
Peter Gacs
Alex Gammerman
Andres Garcia
Russell Greiner
Peter Gr�nwald (Peter Grunwald, Peter Gru:nwald) and here; and (MDL source book) ``Advances in Minimum Description Length'', April 2005, MIT Press.
H
Alan Hajek
Brian Hanlon
Hongxing He
Jos� Hern�ndez i Orallo (and in English, Jose Hernandez-Orallo).
Phil Hingston
Marcus Hutter
I
Information theory: Min. Enc. Length Inference, MDL and MML are based in information theory - as is also AIC. See also quantum info. theory.
Invariance: Strict MML and many of its approximations - such as Wallace & Freeman (JRSS, 1987) and MMLD (or I1D) - are statistically invariant. See, e.g., Dowe, Gardner & Oppy2007 (page ?7? and elsewhere) and Comley & Dowe (2005) (sec. 11.3 [pp268, 269 and 270]). See also invariance and consistency conjecture.
Laurence Irlicht
J
Bruno Janvier.
Tao Jiang
Murray Jorgensen
K
Paul Kabaila
Kenichi Kanatani's recent publications.
Aram Karalic
Gary Warren King
Andrei N. Kolmogorov (1903-1987) sites: http://www.Kolmogorov.com /Kolmogorov.html, a c.v. (in Russian), http://www.pms.ru/kolmogorov (in Russian), exploratorium.edu, dcs.st-and.ac.uk and (by Paul Vitanyi) a biography.
Petri Kontkanen
L
Aaron Lanterman (was here).
Thomas Lee's publications and technical reports.
Shane Legg
Leonid Levin
Mengxiang Li
Ming Li
Xiaodong Li
M
Dean McKenzie's publications.
Enes Makalic.
Steve Maybank (and here).
www.MDL-research.org, people, reading, demonstrations and related topics; and (MDL source book) ``Advances in Minimum Description Length'', April 2005, MIT Press.
Paul Ming's Virtual Turing Machine (VTM) and Virtual Turing Machine 2 (VTM2).
Minimum Description Length (MDL) and comparisons with MML (on pp270-283 and elsewhere) in Comp. J., Vol 42, No. 4, 1999.
Mixture modelling (or finite mixture modeling), clustering and/or unsupervised learning using MML: Snob - see, e.g., Wallace and Dowe (2000).
MML and the "Turing test".
Suzie Molloy.
Robert J. Munro
Petri Myllym�ki
N
Julian R. Neil
Doug Newlands
O
Ockham, William of (circa. 1280 or 1285 till circa. 1347 or 1349, apparently 10th April 1349), Ockham's razor and town of Ockham.
R. D. Ogden, Computer Science Department, Southwest Texas State University, San Marcos, TX 78666, USA; ro01@swt.edu.
Jon Oliver
Graham Oppy
P
Jon Patrick.
Philosophy, Stanford Encyclopaedia of.
Q
Guoqi Qian
Quantum information theory - Quantum Computer Technology (U Melb, U NSW, U Qld).
R
Anand Venkataraman (was Anand (Venkt) Raman).
Fengrong Ren
Jorma Rissanen (and publications, although links seems dead) and some sample publications.
Ricardo Rocha
Teemu Roos
Juho Rousu.
S
Pritika Sanghi.
J. G. Sanjayan
Ju:rgen Schmidhuber
Stanley L. Sclove's journal publications and working papers.
Claude Shannon ("father of information theory")'s obituary (and another obituary), 1916-2001.
Alexander Shen
Tomi Silander
Jatinder Singh.
Snob (software): mixture modelling (to infer a finite mixture model) using MML.
Ray Solomonoff's biography, publications and obituary (New York Times, Wedn 13/Jan/2010): online and scanned - and Solomonoff 85th memorial conference (Wedn 30 Nov - Fri 2 Dec 2011) : 3rd Call for Papers, 2nd Call for Papers, 1st Call for Papers, Invited speakers, accepted papers.
David Suter.
T
Peter Jing Tan; and P. J. Tan and D. L. Dowe (2003), "MML Inference of Decision Graphs with Multi-Way Joins and Dynamic Attributes", Proc. 16th Australian Joint Conference on Artificial Intelligence (AI'03), Perth, Australia, 3-5 Dec. 2003, Published in Lecture Notes in Artificial Intelligence (LNAI) 2903, Springer-Verlag, pp269-281.
Hiroshi Tanaka
Kai Ming Ting
Henry Tirri
Peter Tischer
Phil H. S. Torr (was here).

John Tromp's Home Page (was here).
Alan Turing (1912-1954), developer of (Universal) Turing Machines, among many other things. Sites maintained by Andrew Hodges and dcs.st-and.ac.uk; The Turing archive for the history of computing (maintained by Jack Copeland and Gordon Aston); and image of Alan Turing.
A.M. Turing's (1950) "Computing Machinery and Intelligence", Mind, 59, pp433-460 - which is the paper which introduced the imitation game or "Turing Test".
Turing Machine simulator: http://wap03.informatik.fh-wiesbaden.de/weber1/turing/tm.html and documentation.
Virtual Turing Machine (VTM) and Virtual Turing Machine 2 (VTM2), by Paul Ming.
The "Turing test" and MML.
Charles Twardy and Bayesian Models for Search & Rescue.
U
Unsupervised learning, mixture modelling (or finite mixture modeling) and/or clustering using MML: Snob - see, e.g., Wallace and Dowe (2000).
Aleksey M. Urmanov.
William Uther (was here).
V
Farshid Vahid
Manuela Veloso.
Brani Vidakovic.
Murli Viswananthan (and here)'s publications.
Paul Vitanyi's Publications & Areas of Interest, Kolmogorov complexity publications, selected recent papers and publications.
Volodya Vovk
V. V. V'yugin
W
Chris Wallace 's publications, including "Minimum Message Length and Kolmogorov complexity" (with D. L. Dowe), Comp. J., Vol 42, No. 4 (1999a), pp270-283 [this article is the Computer Journal's most downloaded ``full text as .pdf'' - see, e.g., here]; and including his (posthumous) book: Wallace, C.S. (2005) [posthumous], Statistical and Inductive Inference by Minimum Message Length, Springer (Series: Information Science and Statistics), 2005, XVI, 432 pp., 22 illus., Hardcover, ISBN: 0-387-23795-X [table of contents, chapter headings (was here) and more];
and also including
Chris Wallace MML publications, 1990- and also Chris Wallace MML publications, 1968-1991. See also D. L. Dowe, "Foreword re C. S. Wallace", Computer Journal, Vol. 51, No. 5 (Sept. 2008) [Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560 (and here).
Ian H. Witten
Y
Kenji Yamanishi
Bin Yu's publications.
Z
Alexander K. Zvonkin

Miscellaneous, other, links
Chris Wallace (1933-2004) (developer of MML in 1968) and the Computer J's Christopher Stewart WALLACE (1933-2004) memorial special issue [Vol. 51, No. 5 (Sept. 2008)],
Bayesian Net s using Minimum message length (MML),
clustering and mixture modelling,
data repositories,
decision tree s using Minimum message length (MML),
Occam's razor (Ockham's razor),
Snob (program for MML clustering and mixture modelling) to infer finite mixture model s,
(econometric) time series using MML,
medical research,
a probabilistic sports prediction competition (and further reading on probabilistic scoring),
chess and game theory research;
Feeding the world (TheHungerSite), TheRainforestSite, "do-goody"/"do-goody stuff, improving the world and saving the planet".

Please e-mail me if you would like to know more.

This page is http://www.csse.monash.edu.au/~dld/MELI.html.
Copyright David L. Dowe, Monash University, Australia, 31 Jan. 2000, etc.
Copying is not permitted without expressed permission from David L. Dowe.