Below are David Dowe's 15 Hons projects on offer for 2012: ------------------------------**---------------------------- (1) David Dowe 2012 Hons project #1 Title: Re-visiting entropy as time's arrow Supervisors: David Dowe and David Paganin (Physics) Wallace (2005, chapter 8) discusses that Charge, Parity and Time invariance (or CPT invariance) in physics suggests that entropy of a closed system should be just as likely to increase when going forward in time as when going backwards in time. This flies in the face of conventional wisdom that entropy is supposedly time's arrow. We explore this by repeating and extending Wallace's simulations, also noting our penchant for predicting the future and that the past can be inferred by Minimum Message Length (MML). Reference: C. S. Wallace (2005), ``Statistical and Inductive Inference by Minimum Message Length'', Springer. ``Big bang re-spun'' (or ``Original spin''), New Scientist, 15/Oct/2011, pp44-47 End David Dowe 2012 Hons project #1 ------------------------------**---------------------------- ------------------------------**---------------------------- (2) David Dowe 2012 Hons project #2 Title: Non-standard models of computation and universality Supervisor: David Dowe Zvonkin and Levin (1970) (and possibly earlier, Martin-Lo"f (1966)) consider the probability that a Universal Turing Machine (UTM), U, will halt given infinitely long random input (where each bit from the input string has a probability of 0.5 of being a 0 or a 1). Chaitin (1975) would later call this the halting probability, Omega, or Omega_U . Following an idea of C. S. Wallace's in private communication (Dowe 2008a, Dowe 2011a), Barmpalias & Dowe (to appear) consider the universality probability - namely, the probability that a UTM, U, will retain its universality. If some input x to U has a suffix y such that Uxy simlates a UTM, then U has not lost its universality after input x. Barmpalias, Levin (private communication) and Dowe (in a later simpler proof) have shown that the universality probability, P_U, satisfies 0 < P_U < 1 for all UTMs U and that the set of universality probabilities is dense in the interval (0, 1). We examine properties of the universality probability for non-standard models of computation (e.g., DNA computing). Reference: G. Barmpalias and D. L. Dowe, "Universality probability of a prefix-free machine", accepted, Philosophical Transactions of the Royal Society A End David Dowe 2012 Hons project #2 ------------------------------**---------------------------- ------------------------------**---------------------------- (3) David Dowe 2012 Hons project #3 Title: Database normalisation by Minimum Message Length inference Supervisor: David Dowe Minimum Message Length (MML) (Wallace and Boulton, 1968) is a universal principle in machine learning, statistics and ``data mining'' which, like Ockham's razor, gives us a theory which optimises the trade-off between simplicity and goodness of fit. It also predicts near optimally. Dowe and Zaidi (2010) shows how to achieve database normalisation by following the principles of MML, given sufficient data - but it took the work only to 1NF, 2NF and 3NF. We extend this work to higher normal forms such as BCNF, 4NF and 5NF. References: Dowe and Zaidi (2010). D. L. Dowe (2011a), "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness", Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, pp901-982, 1/June/2011. End David Dowe 2012 Hons project #3 ------------------------------**---------------------------- ------------------------------**---------------------------- (4) David Dowe 2012 Hons project #4 Title: MML time series and Bayesian nets with discrete and continuous attributes Supervisor: David Dowe The first application of MML to Bayesian nets including both discrete and continuous-valued attributes was in Comley & Dowe (2003), refined in Comley & Dowe (2005) [whose final camera-ready version was submitted in Oct 2003], based on an idea in Dowe & Wallace (1998). We seek to enhance this original work to Bayesian nets which can change with time, using the mathematics of MML time series in Fitzgibbon, Dowe et al. (2004). Comley, J. & D.L. Dowe (2003). General Bayesian Networks and Asymmetric Languages, Proc. 2nd Hawaii International Conf' on Statistics and Related Fields, 5-8 June, 2003. Comley, J. & D.L. Dowe (2005). Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in P. Grunwald et al. (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9 Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560 D L Dowe & C S Wallace (1998). Kolmogorov complexity, minimum message length and inverse learning, abstract, page 144, 14th Australian Statistical Conf' (ASC-14), Qld, 6 - 10 July 1998. Fitzgibbon, L.J., D. L. Dowe & F. Vahid (2004). Minimum Message Length Autoregressive Model Order Selection. In M. Palanaswami et al. (eds.), International Conf' on Intelligent Sensing and Information Processing (ICISIP), Chennai, India, Jan 2004, pp439-444 End David Dowe 2012 Hons project #4 ------------------------------**---------------------------- ------------------------------**---------------------------- (5) David Dowe 2012 Hons project #5 Title: MML clustering and mixture modelling, re-visiting Snob Supervisor: David Dowe The Snob program for clustering and mixture modelling using Minimum Message Length (MML) dates back to the seminal Wallace & Boulton (1968) paper. Up until Wallace (1990), all the Snob work on MML clustering and mixture modelling represented the data by clusters (or groups, or components) of either multinomial and/or Normal (or Gaussian) distributions. This was extended in a series of papers (Wallace & Dowe 1994, 1996, 1997, 2000) to include Poisson distributions (for counts) and modelling angular data (such as protein angles) from the von Mises circular distribution. Other extensions have included latent factor analysis for modelling correlations within classes (Edwards & Dowe, 1998) and varieties of spatial image models (Wallace 1998; Visser & Dowe 2007; Visser, Dowe & Uotila, 2009) where we expect the class of a pixel to be influenced by the classes of the neighbouring pixels - and where we typically draw on approximations from thermal physics. Surveys of this work are given in parts of Wallace (2005) and Dowe (2008a). This project will involve extending the program in one of more directions. Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560 Edwards & Dowe (1998) Visser & Dowe (2007) Visser, Dowe & Uotila (2009) Wallace (1990) Wallace (1998) Wallace & Boulton (1968) Wallace & Dowe (1994) Wallace & Dowe (2000) End David Dowe 2012 Hons project #5 ------------------------------**---------------------------- ------------------------------**---------------------------- (6) David Dowe 2012 Hons project #6 Title: (Algorithmic) Information Theory and Measures of Intelligence Supervisor: David Dowe The first work devoted to the relationship between (algorithmic) information theory (equivalently, Minimum Message Length [MML]) and algorithmic information theory appears to be Dowe and Hajek (1997, 1998), partly in response to Searle's ``Chinese room'' argument. More recently, Hernandez-Orallo and Dowe (Artificial Intelligence, 2010) outlined how to use (algorithmic) information theory to devise an anytime universal intelligence test for any subject agent ( http://users.dsic.upv.es/**proy/anynt ), attracting many downloads and articles in "The Economist", "New Scientist" and much other media. There is room for several more people in one aspect of another of this active research project. References: ----------------- J. Hernandez-Orallo and D. L. Dowe (2010), "Measuring Universal Intelligence: Towards an Anytime Intelligence Test", (the) Artificial Intelligence journal (AIJ), Volume 174, Issue 18, December 2010, pp1508-1539. [www.doi.org: 10.1016/j.artint.2010.09.006 .] D. L. Dowe and J. Hernandez-Orallo (2012), "IQ tests are not for machines, yet", accepted, to appear, Intelligence journal. End David Dowe 2012 Hons project #6 ------------------------------**---------------------------- ------------------------------**---------------------------- (7) David Dowe 2012 Hons project #7 Title: Inferring evolution of languages Supervisor: David Dowe The evolution of human languages raises several interesting issues, such as how languages evolve, how the evolution of spoken language relates to the evolution of written language, how it is that geographical regions of related languages can surround one or more regions of languages not related, how populations of language speakers migrated eons ago and possibly also how spoken language relates to DNA. Study of this area also helps with the inference of now-extinct ancestral languages and at least indirectly with the preservation of dying languages. The project will use the Minimum Message Length (MML) principle (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005) (Comley and Dowe, 2005), building upon earlier work in (Ooi and Dowe, 2005). The project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc. References: ----------------- References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005). OoDo2005 Ooi, J.N. and D. L. Dowe, Inferring Phylogenetic Graphs of Natural Languages using Minimum Message Length, Proc. CAEPIA 2005, Vol. 1, pp I:143 - I:152, Nov. 2005. ``The language paradox - why one species speaks in so many different ways'' (or ``Powers of Babel''), New Scientist, 10/December/2011, pp34-37. Wall2005 WaBo1968 Wallace C.S. & Boulton, D.M. (1968) WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. WaFr1987 End David Dowe 2012 Hons project #7 ------------------------------**---------------------------- ------------------------------**---------------------------- (8) David Dowe 2012 Hons project #8 Title: MML inference of support vector machines Supervisor: David Dowe Support Vector Machines (SVMs) are a popular approach to classification in machine learning and "data mining". They are usually only used to divide between two classes ("yes"/"no" or "positive"/"negative") and nor are they typically able to give probabilities with their predictions. They also have some arbitrariness in the choice of "kernel" functions for specifying non-linear boundaries. Using Minimum Message Length (MML) approaches such as those in Tan & Dowe (2004), other notions intimated in Dowe (2007) and some previously overlooked coding inefficiencies, we will be able to overcome all these shortcomings. This will enable us to come up with comparatively simple SVMs which give excellent (probabilistic) predictions on multi-class problems, possibly using non-linear cuts. The mathematics in this project will not be trivial. References: ----------------- ComleyDowe2005 D. L. Dowe (2007), Discussion following "Hedging Predictions in Machine Learning, A. Gammerman and V. Vovk", Computer Journal, Vol. 50, No. 2, March 2007, pp167-168 D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why Simplicity is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec. 2007, pp709-754. P. J. Tan and D. L. Dowe (2004). MML Inference of Oblique Decision Trees, Proc. 17th Australian Joint Conf. on Artificial Intelligence (AI'04), Dec. 2004, Lecture Notes in Artificial Intelligence (LNAI) 3339, Springer, pp1082-1088. Wallace2005 End David Dowe 2012 Hons project #8 ------------------------------**---------------------------- ------------------------------**---------------------------- (9) David Dowe 2012 Hons project #9 Title: MML inference of systems of differential equations Supervisor: David Dowe MML inference of systems of differential equations Many simple and complicated systems in the real world can be described using systems of differential equations (Bernoulli, Navier-Stokes, etc). Despite the fact that we can accurately describe and solve those equations they often fail to produce accurate predictions. In this project, our goalis to create a way of inferring the system of (possibly probabilistic or stochastic (partial or ordinary) differential equations (with a quantified noise term accounting for any inexactness) that describes a real-world system based on a set of given data. Initially we can begin by working on a single equation with one unknown. (The noise could be due to a number of effects such as measurement inaccuracies or oversimplified models used.) From there, we can progressively move to gradually more complicated equations. Minimum Message Length (MML) will be one of the tools used for modelling as it can provide ways of producing simpler models that actually fit closer than their more complicated counterparts produced by other methods. The project will become increasingly CPU-intensive but will ultimately have many real-world applications in a wide range of areas. References: ----------------- Wallace (2005) Dowe (2011a) End David Dowe 2012 Hons project #9 ------------------------------**---------------------------- ------------------------------**---------------------------- (10) David Dowe 2012 Hons project #10 Title: Econometric, statistical and financial time series modelling using MML Supervisor: David Dowe Time series are sequences of values of one or more variables. They are much studied in finance, econometrics, statistics and various branches of science (e.g., meteorology, etc.). Minimum Message Length (MML) inference (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005)(Comley and Dowe, 2005) has previously been applied to autoregressive (AR) time series (Fitzgibbon et al., 2004), other time series (Schmidt et al., 2005) and (at least in preliminary manner) both AR and Moving Average (MA) time series (Sak et al., 2005). In this project, we apply MML to the Autoregressive Conditional Heteroskedasticity (ARCH) model, in which the (standard deviations and) variances also vary with time. Depending upon progress, we can move on to the GARCH (Generalised ARCH) model or Peiris's Generalised Autoregressive (GAR) models, or to inference of systems of differential equations. This project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc. References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005). FiDV2004 Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004). SaDR2005 ScPL2005 Wall2005 WaBo1968 WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. WaFr1987 End David Dowe 2012 Hons project #10 ------------------------------**---------------------------- ------------------------------**---------------------------- (11) David Dowe 2012 Hons project #11 Title: Probabilistic machine learning (and copula selection) using MML Supervisor: David Dowe All too many machine learning algorithms have their success measured by how many classifications they got right and how many they got wrong. This common-sense measure is a fine and reasonable place to start, but it is highly dependent upon the framing of the problem (Dowe 2011a, sec. 3). Consider, for example, two different multiple-choice exams where one of them combines the questions of another into one (big) question. It turns out that the only scoring system which remains invariant to re-framing of the questions is log-loss scoring [where probabilities are allocated to each prediction and the score given is the logarithm of the probability allocated to that prediction] (Dowe 2008a, footnote 175; Dowe 2008b; Dowe 2011a, sec. 3). One possible direction in which to take this project is using Minimum Message Length (MML) for copula selection - a quite general modelling approach. References: ----------------- D. L. Dowe (2008a), ``Foreword re C. S. Wallace'', Computer J. D. L. Dowe (2008b). D. L. Dowe (2011a), "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness", Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, pp901-982, 1/June/2011. R. Fujimaki, Y. Sogawa and S. Morinaga (2011),``Online heterogeneous mixture modeling with marginal and copula selection'' Proc. KDD '11, Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. End David Dowe 2012 Hons project #11 ---------------------------------------------------------- ---------------------------------------------------------- (12) David Dowe 2012 Hons project #12 Title: Vision enhancement and spike sorting algorithms for multiple electrodes using MML 24-point project Supervisors: David Dowe and Ramesh Rajan (Physiology) Spike signals obtained from one or more electrodes receiving input from the brain are much studied. The signals are typically a mix of one or more component signals, and both frequencies and certainly amplitudes can vary - and can be affected by local factors such as capacitance. Where there are several well-separated electrodes, there might be weak correlation between the signals, which can possibly be modelled by latent factor analysis and/or some degree of time series delay. A typical data-set might be large with data from over 100 electrodes, modelling neural responses to visual stimuli. The project will use Minimum Message Length (MML), whose generality enables it to be used to infer any computable function (Wallace & Dowe 1999a; chap. 2 of Wallace 2005; Dowe 2011) . Co-supervision will come from Physiology. The Department of Physiology will provide data with varying degree of complexity: ranging from single electrode data with known number of sources, single electrode with unknown number of sources, and electrode array data. The project will start with a simple problem and work towards more challenging problems, with our developing mathematical MML models as elaborate as time permits us to. A knowledge of physiology and/or brain signal data will be an advantage, and strong mathematics will be essential. References: Dowe (2011a) www.frontiersin.org/SearchData.aspx?sq=Spike+sorting Quiroga (2012) http://www.scholarpedia.org/article/Spike_sorting Wallace (2005) Wallace & Dowe (1999a) Wild, Prekopcsak, Sieger, Novak and Jech (2012) End David Dowe 2012 Hons project #12 ---------------------------------------------------------- ---------------------------------------------------------- (13) David Dowe 2012 Hons project #13 Title: Model selection and parameter estimation for influenza infection 24-point project Supervisors: David Dowe and James McCaw Influenza viruses enter the body and replicate by infecting susceptible cells (tissue) in the airways. Infected cells release multiple progeny virus, leading to initial exponential growth in the viral load. Under the classical 'target cell limited' model of within-host dynamics, this growth gives way to a peak and then decline in viral load once the number of susceptible (target) cells has declined to such an extent that any given influenza virus is unlikely to find and infect one during its lifetime. The process of invasion, replication and decline of virus is described by a set of coupled non-linear 1st order ordinary differential equations. The model paradigm has been widely used to explain key biological and epidemiological phenomena such as susceptibility, infectiousness, drug efficacy and the emergence and consequences of drug-resistance. However, we know that this simple model is wrong - we have the counter-example of influenza infections that do not resolve in immunocompromised individuals, indicating that processes other than target cell depletion also contribute to control and eventual suppression of viral load. Biologically, it is clear that these 'other processes' originate from the action of the innate and adaptive immune systems. However, collecting relevant time-series data to differentiate between models that include immune-responses to varying degrees is extremely i) time-consuming, ii) costly and iii) difficult to justify if you don't know what data one really needs to accept/reject different model constructs. We will use the information-theoretic techniques behind Minimum Message Length (MML) to develop optimised strategies for choosing what experiments to run. Through collaborations with colleagues in virology and immunology we will then run those experiments, and use the data and our knowledge of biological plausibility to find the best MML inference. The student will have some knowledge of complex systems and non-linear analysis, gained through study in mathematics, physics, computer science, statistics, econometrics, (electrical) engineering or a related field. The ability to program is essential. References: Dowe (2011a) Amber M. Smith and Alan S. Perelson, Influenza A virus infection kinetics: quantitative data and models, Wiley Interdiscip Rev Syst Biol Med (2010). Wallace (2005) End David Dowe 2012 Hons project #13 ---------------------------------------------------------- ---------------------------------------------------------- (14) David Dowe 2012 Hons project #14 Title: Software testing as we approach the technological singularity Alternative Title: *Can software testing stop Terminator's SkyNet from being released into the wild?* Supervisors: Robert Merkel and David Dowe Hernandez-Orallo and Dowe have proposed a universal intelligence test that can, at least theoretically, provide a measure of intelligence for any entity, computational or biological. In a nutshell, they define intelligence as the ability to maximise rewards across a broad range of complex environments. Their test assumes that "rewards" are perceived in the same way by the tester and the test participant, which may not always be the case. For instance, in their paper, they use the example of a chimpanzee; a "correct answer" in the test is rewarded with a banana, which is always perceived as a greater reward than not getting a banana. But what if our chimp has had its fill of bananas? What if it starts pushing buttons at random, just to see what happens? Or deliberately chooses the wrong answer - perhaps with the hope of ending the test early? Or perhaps a particularly cunning chimp chooses any one of the above courses of action because it wants to disguise its true intelligence, to lull its human jailers into a false sense of security and give it the opportunity to escape the lab? On the other hand, what about a chimp that wants to send a message to its tester by, for instance, alternating between very good and obviously incorrect answers? J. Hernandez-Orallo and D. L. Dowe (2010), "Measuring Universal Intelligence: Towards an Anytime Intelligence Test", (the) Artificial Intelligencejournal ( AIJ ), Volume 174, Issue 18, December 2010, pp1508-1539. [www.doi .org: 10.1016/j.artint.2010.09.006.] D. L. Dowe and J. Hernandez-Orallo (2012), "IQ tests are not for machines, yet", accepted, to appear, Intelligence journal. End David Dowe 2012 Hons project #14 ---------------------------------------------------------- ---------------------------------------------------------- (15) David Dowe 2012 Hons project #15 Title: Inference of Liquid State Machines Supervisors: Asad Khan and David Dowe See Asad Khan's Hons projects End David Dowe 2012 Hons project #15 ---------------------------------------------------------- ---------------------------------------------------------- (16) David Dowe 2012 Hons project #16 Title: Other Supervisor: David Dowe Please see me if you have good mathematics and are interested in issues pertaining to machine learning, statistics, information theory, ``data mining'' and/or quantifying the notion of intelligence. End David Dowe 2012 Hons project #16 ---------------------------------------------------------- ------------------------------**----------------------------