David Dowe's data links
[See also
Ray Solomonoff (1926-2009)
85th memorial conference
(Wedn 30 Nov - Fri 2 Dec 2011),
1st Call for Papers
.]
Machine learning, statistics and "data mining" data
U. Calif. Irvine
(
UCI
)
ICS
KDD Archive
,
Machine Learning Repository
and
other machine learning repositories and sites
.
NIST
(U.S.)'s
Info. Tech. Lab.
's
Statistical Reference Datasets (StRD)
and
Dataset archives
.
CMU
Dept. of Statistics
's
StatLib links
,
Datasets Archive
and
"other places" and statistical archives
.
Baylor University
Libraries
Computer Science
Data Repositories
.
Machine Learning Resources
-
Data Repositories
and
competitions
, maintained by
David Aha
.
Online Machine Learning Resources
:
ML Benchmarks and other Data Sources
.
Bayesian Network
data sets
(or
Bayes Net
data sets
) - see also
Bayesian Networks
using
MML
.
Dept of Computer Science, University of Toronto
's
Data for Evaluating Learning in Valid Experiments (DELVE)
's
Datasets Summary Table
, including the
Titanic dataset
.
KDNuggets
's
Datasets for
"
Data Mining
" and
"Data Mining" Competitions
.
"The Data Mine"
's
Data Sources
.
Rob Hyndman
's
Time Series Data Library
and
CEC2000
's
Time series prediction competitions
.
UCR Time Series Data Mining Archive
, linked to by
Eamonn Keogh
.
Data on the Web
-
Faculty of Business and Economics
,
University of Sydney
, Australia.
AskDrMath
(
The Math Forum - Math Library
)'s
Data Sets
,
Prob/Stat
and
Statistics
:
Data Sets
.
Brookhaven Protein Database
(
old site
)
Gopher
;
SWISS-PROT Protein Sequence Database
and
CSSE
Contig Restriction Site Mapping
and
links
(
human genome project
, etc.).
Kathleen Cuningham Foundation Consortium for research into Familial Breast cancer
(
http://www.kconfab.org
)'s
policies and procedures for accessing kConFab data
.
European Pulsar Network
Data Archive
(and
mirror site
) (and
disclaimer
)'s
index
(and
Russell Edwards
's
comments
):
Data Archive
.
Statistical Society of Canada
's
Case Studies in Data Analysis for 2000
.
Bayesian networks repository
(started by Nir Friedman);
Bayesian networks
and
Related sites
.
University of Fribourg
Section of Chemistry
's
Useful Chemistry Links and databases
.
ICMAS-2000
:
market game
and
ICMAS-00 Trading Agent Competition Overview
.
Linguistic Data Consortium
(
LDC
):
LDC-Online
,
LDC Catalog(ue)
,
Obtaining corpora
and
Search LDC Web site
.
Links to text analysis resources
.
Geoff McLachlan
and David Peel's "
Finite Mixture Models
" and
data sets
.
Australian Antarctic Division
(
AAD
) and
Australian Antarctic Data Centre
.
Search and Rescue
Data collection form
(
HTML
,
Word97
,
postscript
,
pdf
) -
Charles Twardy
.
Competitions
Machine Learning Resources
-
Data Repositories
and
competitions
, maintained by
David Aha
.
KDNuggets
's
Datasets for
"
Data Mining
" and
"Data Mining" Competitions
.
Rob Hyndman
's
Time Series Data Library
and
CEC2000
's
Time series prediction competitions
.
ICMAS-2000
:
market game
and
ICMAS-00 Trading Agent Competition Overview
.
KDD Cup 2000
, e-mail:
kddcup2000@bluemartini.com
.
This is the homepage of The Insurance Company (TIC) Benchmark
.
Other data
Some links to
chess and games data
.
Sports:
Australian Rules football
with
data since 1993
,
data since 1998
,
other footy statistics
and
some other sports data
.
Medical links
(with some
Medical data links
), and
EEG data
(
electroencephalograph data
) from
UCI KDD Archive
(
http://kdd.ics.uci.edu
).
www.statoo.com
: "the portal to statistics on the internet" (so they say).
Links to Random number generation software
(Pseudo-)
Random number generation software
in Fortran :
uniform
(for multinomial),
Gaussian
(Normal),
von Mises (circular)
and
Poisson
.
Random number generation (and other) publications by Chris Wallace
: TR #89/123 (Feb. 1989), 1990, 1996.
http://www.almaden.ibm.com/cs/quest
:
synthetic market-basket dataset generator
.
http://www.almaden.ibm.com/cs/people/bayardo/vinci/maxminer.html
:
max-miner algorithm, which generates frequent itemsets, in order to test your algorithm output
. (Use the FINDALL option, unless you want only the maximal frequent itemsets.)
http://lib.stat.cmu.edu/DASL/DataArchive.html
.
Random number (generator)s and Monte Carlo methods
:
Information Servers
,
Theory
,
Applications
and
Software
.
Other RNG software
: " C Programming " ; " Code Snippets " ;
" Portable functions and headers "; " Random number functions " ; " Rand1.C ".
Data analysis and ``data mining''
Minimum Message Length
(
MML
), an operational form of
Occam's razor
[see also
Minimum Description Length
,
MDL
].
Clustering
,
mixture modelling
and
unsupervised learning
.
Miscellaneous, other, links
Chris Wallace (1933-2004)
(developer of
MML
in
1968
),
Wallace, C.S.
(2005) [posthumous],
Statistical and Inductive Inference by Minimum Message Length
, Springer (Series: Information Science and Statistics), 2005, XVI, 432 pp., 22 illus., Hardcover, ISBN: 0-387-23795-X [
table of contents
,
chapter headings
and
more
],
Wallace, C.S.
(with D. L. Dowe), "
Minimum Message Length and Kolmogorov complexity
",
Comp. J., Vol 42, No. 4
(1999),
pp270-283
[this article is the Computer Journal's most downloaded ``full text as .pdf'' - see, e.g.,
here
],
Bayesian networks
using
MML
,
clustering and mixture modelling
,
decision tree
s
and
decision graph
s
using
MML
,
"Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", by J. W. Comley and
D.L. Dowe
;
Chapter 11
(pp
265
-
294
) in P. Grunwald, M. A. Pitt and I. J. Myung (eds.),
Advances in Minimum Description Length: Theory and Applications
, M.I.T. Press, April 2005, ISBN 0-262-07262-9. {This is about Generalised Bayesian nets (or even the special case of hybrid Bayesian nets), generalising MML Bayesian nets or
MML Bayesian networks
or
MML Bayes nets
; and it deals with a mix of both continuous and discrete variables. (See also
Comley and Dowe (2003)
,
.pdf
.)}
Occam's razor
(
Ockham's razor
),
Snob
(program for
MML
clustering and mixture modelling
),
(
econometric
)
time series
using
MML
,
medical research
,
a probabilistic sports prediction competition
(and
further reading on probabilistic scoring
),
chess and game theory research
,
TheHungerSite
,
TheRainforestSite
, "
do-goody
"/"
do-goody stuff, improving the world and saving the planet
".
Please
e-mail me
if you would like to know more.
This page, http://www.csse.monash.edu.au/~dld/datalibrary.html , was last updated no earlier than 18th Apr. 2000.
Copyright
David L. Dowe
,
Monash University
, Australia, 15 March 2000.