(1) David Dowe's research interests and scope of Hons. projects offered for 2005. (Last updated Thu. 7th Apr. 2005.) David Dowe is interested in Minimum Message Length (MML) inductive inference. The MML principle is particularly useful in machine learning, statistics, econometrics, "knowledge discovery", "data mining" and philosophy of science. Both theoretical and applied projects are available, some of which are listed below, and all of which you should feel free to discuss with David Dowe. Areas of interest include clustering and mixture modelling, the von Mises circular distribution, single and multiple factor analysis, supervised learning, decision trees and decision graphs with or without leaf regressions, sequentially and spatially correlated data, protein folding, DNA string alignment, the human genome project and market forecasting; ... and pretty well any inference problem. All of these would be done by MML, and all would require at least fairly good mathematics. There is no need to have done any 3rd year subject on artificial intelligence (AI) for any of these projects. I am not listing all my available 20-point projects here, so if your mathematics is at least good and if you would like to know about other possible projects, please feel welcome to ask me. In short, other possible 20-point projects include: o Inductive logic programming (ILP), o Generalised Bayesian Networks (see, e.g., Comley and Dowe (2003, 2005)) with medical, oncological, epidemiological and other applications o Evolving and quantifying intelligence, o Correcting theoretical and empirical misconceptions in Ockham's razor, o Hypothesis testing and model selection by MML, o Analysis of 2dF Galaxy Redshift Survey data. (2) Phylogenetic modelling of indigenous, endangered and other languages (20 point) David Dowe. This project concerns phylogenetic modelling of languages. In other words, we wish to model how languages have descended from and evolved from one another. There will be an emphasis on endangered indigenous languages, largely as a way to help preserve them. The modelling should be carried out using Minimum Message Length (MML), largely because of its theoretical optimality properties and its wide-ranging achievements in all range of inference problems. The project could be taken in the direction of using (probablistic) finite state automata/machines (PFSAs) to model grammars or syntax. A good or better mathematical background will be necessary, and an interest in languages, linguistics or indigenous languages would be a welcome bonus. If doing this project, you are strongly encouraged to talk to me (the supervisor) first. References : ------------ The Abbadingo competition for grammar induction [Abbadingo One: DFA Learning Competition] http://abbadingo.cs.unm.edu ASEDA - the electronic data archive of the Australian Institute of Aboriginal and Torres Strait Islander Studies: http://coombs.anu.edu.au/SpecialProj/ASEDA/ASEDA.html D. Benedetto, E. Caglioti, and V. Loreto (2002), "Language Trees and Zipping", Physical Review Letters (Phys. Rev. Lett.) 88, 048702 (2002) [URL: http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=PRLTAO000088000004048702000001&idtype=cvips&gifs=yes or http://scitation.aip.org/getpdf/servlet/GetPDFServlet?filetype=pdf&id=PRLTAO000088000004048702000001&idtype=cvips] C.H. Bennett, Ming Li and B. Ma (2003), Chain Letters and Evolutionary Histories, Scientific American, June 2003, pp64-69. S. Bird (in progress), Collaborative Annotation of Indigenous Audio Recordings http://www.cs.mu.oz.au/research/lt/student-projects.html Carrington, Lois & Geraldine Triffett 1999 'OZBIB: a linguistic bibliography of Aboriginal Australia and the Torres Strait Islands.' (Matheson Library Main Collection, Monash University, Clayton Call number 499.15016 C318.0.) Comley, Joshua W. and D.L. Dowe (2005). Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, Chapter 11, P. Gru:nwald, M. A. Pitt and I. J. Myung (ed.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9. Ellison, T. Mark (1992). The Machine Learning of Phonological Structure. Doctor of Philosophy (Ph.D) Thesis, University of Western Australia. Finegan, E., D. Blair and P. Collins (c1997). Language: its structure and use, 2nd edn., Harcourt Brace, Sydney, 526pp. http://en.wikipedia.org/wiki/Category:Languages http://en.wikipedia.org/wiki/Category:Linguistics http://research.microsoft.com/~ringger/FeatureEngineeringWorkshop/ http://www.anu.edu.au/linguistics/nash/aust and links http://www.cs.usyd.edu.au/~jonpat/az_an_files/basque.html http://www.cs.usyd.edu.au/~jonpat/az_an_files/azkue_analysis.html http://www.cs.usyd.edu.au/~mike/ http://www.cs.usyd.edu.au/~mike/abstract.html http://www.dnathan.com/VL/austLang.htm and links http://www.isda2005.pwr.wroc.pl/events_workshops.html#EGI Ishibuchi, Hisao, Nakashima, Tomoharu, Nii and Manabu, Classification and Modeling with Linguistic Information Granules Advanced Approaches to Linguistic Data Mining Series: Advanced Information Processing, 2005, XI, 307 p. 217 illus., Hardcover, ISBN: 3-540-20767-8 Languages of Australia [Australia's indigenous languages] http://www.ethnologue.com/show_country.asp?name=Australia http://www.ethnologue.com/show_map.asp?name=Australia&seq=1 http://www.ethnologue.com/show_map.asp?name=Australia&seq=2 Lixi, "The illustrated international phrase book: how to get what you want in eight languages, English, German, French, Italian, Spanish, Greek, Japanese, Dutch", Mayflower Books, New York, 1st American ed., c1979, 192 p. [www.abetitles1.com/Title/1389337/Illustrated+International+Phrase+Book.html] @inproceedings{ mahoney99text, author = "Matthew V. Mahoney", title = "Text Compression as a Test for Artificial Intelligence", booktitle = "{AAAI}/{IAAI}", pages = "970", year = "1999", url = "citeseer.ist.psu.edu/171781.html", note = "http://www.cs.fit.edu/~mmahoney/poster.ps.Z" } "The gift of the gab", New Scientist (www.NewScientist.com), 8 January 2005, pp40-43. Wallace, C.S. and D.L. Dowe (1999). Minimum Message Length and Kolmogorov Complexity, Computer Journal (special issue on Kolmogorov complexity), Vol. 42, No. 4, pp270-283. http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/ http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/pdf/420270.pdf (3) Time-series autoregression using MML with econometric applications (20 point) David Dowe In this project, we adapt a Bayesian modelling strategy, namely the minimum message length (MML) principle, to the problem of efficient partitioning of economic units, such as firms or countries, into groups (or regions) whose behavioural patterns are similar within each group but distinct across groups. This methodology is superior to classical model selection methods partly because it can incorporate the requirements of economic theory. The approach to the partitioning or "pooling" will most probably be MML multi-way join decision graphs (Tan and Dowe, 2002). We will develop an autoregression model in the leaves of the decision graph. Time permitting, this leaf (auto)regression model will possibly be augmented with a variety of lags (or time delays). The resulting software will ideally but not necessarily be developed in Java. Ideally, if time again permits, we hope to consider two specific applications, namely: modelling gasoline demand in OECD countries, and finding the foreign factor with the most predictive power for the growth rate of the Australian economy. There are and could be a variety of variations of this project into different ways of analysing time series data. A good mathematical background will be necessary. If doing this project, you are strongly encouraged to talk to me (the supervisor) first. References: ----------- Comley, Joshua W. and D.L. Dowe (2005). Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, Chapter 11, P. Gru:nwald, M. A. Pitt and I. J. Myung (ed.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9. Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004). Minimum Message Length Autoregressive Model Order Selection. In M. Palanaswami, C. Chandra Sekhar, G. Kumar Venayagamoorthy, S. Mohan and M. K. Ghantasala (eds.), International Conference on Intelligent Sensing and Information Processing (ICISIP), Chennai, India, 4-7 January 2004 (ISBN: 0-7803-8243-9, IEEE Catalogue Number: 04EX783), pp439-444. www.csse.monash.edu.au/~dld/Publications/2004/Fitzgibbon+Dowe+Vahid2004.ref @InProceedings{LJF2000, author = "L. J. Fitzgibbon and L. Allison and D. L. Dowe", title = "Minimum Message Length Grouping of Ordered Data", editor = "H. Arimura and S. Jain", booktitle = "Proceedings of the 11th International Conference on Algorithmic Learning Theory (ALT2000)", series = "LNCS", publisher = "Springer-Verlag Berlin", year = "2000", } @inproceedings{Oliver:91b, author = "Oliver, J.J. and Wallace, C.S.", title = "Inferring Decision Graphs", booktitle = "Proceedings of Workshop 8 --- Evaluating and Changing Representation in Machine Learning IJCAI-91", year = "1991. } P. J. Tan and D. L. Dowe (2003). MML Inference of Decision Graphs with Multi-Way Joins and Dynamic Attributes, Proc. 16th Australian Joint Conference on Artificial Intelligence (AI'03), Perth, Australia, 3-5 Dec. 2003, Published in Lecture Notes in Artificial Intelligence (LNAI) 2903, Springer-Verlag, pp269-281. http://www.csse.monash.edu.au/~dld/Publications/2003/Tan+Dowe2003.ref @incollection{Vahid:99, author = "Vahid, F.", title = "Partial pooling: {A} possible answer to ``{T}o pool or not to pool''", editor = "Engle, R.F. and White, H.", booktitle = "Festschrift in honor of Clive Granger", note = "Chapter 17", year = 1999 } Wallace, C.S. and D.L. Dowe (1999). Minimum Message Length and Kolmogorov Complexity, Computer Journal (special issue on Kolmogorov complexity), Vol. 42, No. 4, pp270-283. http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/ http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/pdf/420270.pdf @article{Wallace:93, author = "Wallace, C.S. and Patrick, J.D.", title = "Coding Decision Trees", journal = "Machine Learning", volume = 11, pages = "7-22", year = 1993, } This WWW page is http://www.csse.monash.edu.au/~dld/Hons/2005/dld2005_projects .