Lloyd Allison - Publications.

home1 home2
and the



%A D. Sumanaweera
%A L. Allison
%A A. S. Konagurthu
%T Bridging the Gaps in Statistical Models of Protein Alignment
%J Bioinformatics
%V 38
%N issue supplement_1
%P i229–i237
%D 2022
%O ISMB July 2022, Madison, USA
%K conf, ISMB, MolBio, c2022, c202x, c20xx, zz0722, LAllison, ArunK, Dinithi,
   sequence alignment, gap, indel, model, substitution, PAM, BLOSUM, MML, MDL
%X "... To overcome this gap, this article demonstrates how a complete stat.
   model quantifying the evolution of pairs of aligned proteins can be
   constructed using a time-parameterized substitution matrix &
   a time-parameterized alignment state machine. Methods to derive all
   parameters of such a model from any benchmark collection of aligned protein
   seqs. are described here. This has not only allowed us to generate a unified
   stat.model for each of the nine widely used subst.matrices (PAM, JTT, BLOSUM,
   JO, WAG, VTML, LG, MIQS & PFASUM), but also resulted in a new unified model,
   MMLSUM. Our underlying methodology measures the Shannon info. content using
   each model to explain losslessly any given collection of alignments, which
   has allowed us to quantify the performance of all the above models on six
   comprehensive alignment benchmarks. Our results show that MMLSUM results in a
   new & clear overall best performance ..."
   -- [doi:10.1093/bioinformatics/btac246]['22],
    & 2010.00855@[arXiv]['20].

%A S. Rajapaksa
%A D. Sumanaweera
%A A. M. Lesk
%A L. Allison
%A P. Stuckey
%A M. Garcia de la Banda
%A P. Stuckey
%A D. Abramson
%A A. S. Konagurthu
%T On the Reliability and Limits of Protein Sequence Alignments
%J Bioinformatics
%V 38
%N issue supplement_1
%P i255–i263
%D 2022
%O ISMB July 2022, Madison, USA
%K conf, ISMB, MolBio, c2022, c202x, c20xx, zz0722, sequence alignment,
   protein, structure, sequence, proximity, twilight, midnight, zone,
   AMLesk, LAllison, ArunK, Sandun
%X "... Using techniques not prev. applied to these questions, by weighting
   every possible seq. alignment by its posterior prob. we derive a formal
   math. expectation, & develop an efficient alg. for computation of the
   distance between alternative alignments ... By analyzing the seqs. &
   structures of 1 million protein domain pairs, we report the variation of the
   expected distance between seq.-based & structure-based alignments, as a fn
   of (Markov time of) seq. divergence. Our results clearly demarcate the
   'daylight', 'twilight' & 'midnight' zones for interpreting residue-residue
   correspondences from seq. information alone."
   -- [doi:10.1093/bioinformatics/btac247]['22].

%A L. Allison
%T Subclasses of Class Function used to Implement Transformations of
   Statistical Models
%J arXiv
%D 2022
%K TR, c2022, c202x, c20xx, zz0722, type, class, function, FP, subclass, MDL,
   model, MML, OOP, AI, inference, II, continuous, inverse, transform,
%X "A library of s/w for inductive inference guided by the Minimum Message
   Length (MML) principle was created previously. It contains various
   (object-oriented-) classes & subclasses of statistical Model & can be used to
   infer Models from given data sets in machine learning problems. Here
   transformations of statistical Models are considered & implemented within the
   library so as to have desirable properties from the object-oriented
   programming & mathematical points of view. The subclasses of class Function
   needed to do such transformations are defined."
   -- 2207.04218@[arXiv]['22].

%A S. Rajapaksa
%A D. Sumanaweera
%A M. Garcia de la Banda
%A P. Stuckey
%A D. Abramson
%A L. Allison
%A A. Lesk
%A A. Konagurthu
%Y On identifying statistical redundancy at the level of amino acid subsequences
%J Int. Conf. on Bioinformatics and Biomedicine (BIBM)
%D 2021
%K conf, c2021, c202x, c20xx, zz0122, MolBio, LAllison, AMLesk, ArunK,
   protein, information content, stats, compress, compression, library,
   fragments, sequence, subsequence, structure, indels, source code
%X "... presents a framework to characterize & identify local sequences of
   proteins that are statistically redundant under the measure of Shannon
   information content while accounting for variations in their occurrences over
   evolutionary insertions, deletions, & substitutions of amino acids. The
   identification of such local seqs. provides insights for downstream studies
   on proteins. Here, we have applied our methods to amino acid seq. data sets
   derived from a database corr.to 935,552 substructural regions of varying
   sizes, covering 113,724 proteins from the protein data bank. The results
   identify, among others, a surjective mapping between 110,598 local seqs.
   (with an avg. length of 82 AAs/seq.) & 1,493 topological shapes. ..."
   -- [doi:10.1109/BIBM52615.2021.9669282]['22],
   Also see supplementary@[lcb]['22].
   [Also search for: MolBio protein compression].

%A Yang Li
%A L. Allison
%A K. B. Korb
%T The difficulty of being moral
%J Theor. Comp. Sci.
%V 885
%P 77-90
%D 2021
%K jrnl, TCS, c2021, c202x, c20xx, zz1121, kelvin, LAllison, KBKorb,
   graph, maths, moral, complexity, Markov blanket, Bayesian network,
   BNet, Bnets, AI, NPC
%X -- [doi:10.1016/j.tcs.2021.06.024]['21].

%A L. Allison
%A A. S. Konakurthu
%A D. F. Schmidt
%T On universal codes for integers: Wallace Tree, Elias Omega and beyond
%J Data Compression Conference
%P 313-322
%D 2021
%K conf, DCC, DCC21, DCC2021, c2021, c202x, c20xx, zz0321, universal code,
   ints, integer, Elias omega, omega2, omega*, omega star, Fibonacci,
   Wallace Tree code, WTC, probability distribution, compression, entropy,
   LAllison, ArunK
%X "A universal code for the (positive) integers is a variable length code that
   can be used to store or compress a sequence of integers. It also implies a
   probability distribution on integers which can be a natural choice when the
   true distribution of a source of integers is unknown; such a code and
   distribution may be useful in statistical inference. This paper provides two
   improvements to the theory and practice of universal codes. First, it defines
   and examines a new universal code omega* (omega-star) that asymptotically
   beats the Elias omega code.  Second, it analyses the properties of a code
   proposed by Wallace based on trees, and shows it to be a universal code, to
   have desirable properties for use in inference, and to beat the Elias omega
   code on almost all integers up to the 1697-bit code-word mark. Encoding and
   decoding routines for the codes described here are implemented and available
   for interactive use[1]." Also see [www].
   -- [more]['21] inc. source code,

%A A. S Konagurthu
%A R. Subramanian
%A L. Allison
%A D. Abramson
%A P. J. Stuckey
%A M. Garcia de la Banda
%A A. M. Lesk
%T Universal architectural concepts underlying protein folding patterns
%J Front. Mol. Biosci.,
   A Journey Through 50 Years of Structural Bioinformatics in
   Memoriam of Cyrus Chothia
%P ?-?
%D 2021
%K jrnl, FMB, MolBio, bioinformatics, c2021, c202x, c20xx, zz0121,
   procodic, prosodic, 3D protein structure, fold, folds, tableau, motif,
   motifs, pattern, model, concept, structural, terms, fragment, supersecondary,
   local, dictionary, recurring, library, MML, II, ArunK, LAllison, AMLesk
%X "What is the architectural 'basis set' of the observed universe of protein
   structures? Using information-theoretic inference, we answer this Q. with a
   comprehensive dictionary of 1,493 substructures - called concepts - at a
   sub-domain level, based on an unbiased subset of known protein structs. ...
   An interactive site, PROCODIC, at ...[prosodic] provides access to
   & navigation of the entire dictionary of concepts, & all assoc. info.."
   -- [doi: 10.3389/fmolb.2020.612920]['21],

%A D. Sumanaweera
%A L. Allison
%A A. S. Konagurthu
%T Bridging the gaps in statistical models of protein alignment
%J arXiv
%D 2020
%K TR, MolBio, c2020, c202x, c20xx, zz1020, MML, protein alignment, sequence,
   DPA, stats, model, gap, indel, indels, substitution, scoring, matrix,
   proteins, PAM, BLOSUM, PFASUM, MMLSUM, HMM, minimum message length, MDL,
   information, Dinithi, LAllison, ArunK
%X "... demonstrates how a complete statistical model quantifying the evolution
   of pairs of aligned proteins can be constructed from a time-parameterised
   substitution matrix and a time-parameterised 3-state alignment machine. All
   parameters of such a model can be inferred from any benchmark data-set of
   aligned protein seqs.. This allows us to examine nine well-known sub.matrices
   on six benchmarks curated using various structural alignment methods; any
   matrix that does not explicitly model a 'time'-dependent Markov process is
   converted to a corr. base-matrix that does. [&] a new optimal matrix is
   inferred for each of the six benchmarks. Using Minimum Message Length (MML)
   inference, all 15 matrices are compared in terms of measuring the Shannon
   information content of each benchmark. This has resulted in a new & clear
   overall best performed time-dependent Markov matrix, MMLSUM, & its assoc.
   3-state m/c, whose properties we have analysed in this work. For std use, the
   MMLSUM series of (log-odds) scoring matrices derived from the above Markov
   matrix, are available at lcb.infotech.monash.edu.au/mmlsum "
   -- 2010.00855@[arXiv]['20].

%A A. M. Lesk
%A A. S. Konagurthu
%A L. Allison
%A M. Garcia de la Banda
%A P. J. Stuckey
%A D. Abramson
%T Computer modelling of a potential agent against SARS-Cov-2 (COVID-19)
%J Proteins: Structure, Function and Bioinformatics
%V 88
%N 12
%P 1557-1558
%D 2020
%K jrnl, MolBio, c2020, c202x, c20xx, zz0720, SARS-CoV-2, SARSCoV2, protease,
   protein, Mpro, 3CLpro, nsp5, covid19, covalent, inSilico, ligand,
   inhibitor, Cys145, AMLesk, LAllison, ArunK
%X "We have modelled modifications of a known ligand to the SARS‐CoV‐2
   (COVID‐19) protease, that can form a covalent adduct, plus additional
   ligand‐protein hydrogen bonds."
   -- [doi:10.1002/prot.25980]['20]  (online 14/7/2020).
   [Also search for: SARSCoV2 protease].

%A D. Sumanaweera
%A L. Allison
%A A. Konagurthu
%T Statistical compression of protein sequences and inference of marginal
   probability landscapes over competing alignments using finite state models
   and Dirichlet priors
%J Bioinformatics
%V 35
%N 14
%P i360–i369
%D 2019
%O ISMB/ECCB, Basel, .ch, July 2019
%K jrnl, conf, MolBio, c2019, c201x, c20xx, zz0719, LAllison, ArunK,
   bioinformatics, sequence, alignment, landscape, DPA, homology, protein,
   proteins, probability, probabilistic, Bayesian, information, MML, MDL,
%X "The information criterion of minimum message length (MML) provides a
   powerful statistical framework for inductive reasoning from observed data. We
   apply MML to the problem of protein sequence comparison using finite state
   models with Dirichlet distributions. The resulting framework allows us to
   supersede the ad hoc cost functions commonly used in the field, by
   systematically addressing the problem of arbitrariness in alignment
   parameters, and the disconnect between substitution scores and gap costs.
   Furthermore, our framework enables the generation of marginal probability
   landscapes over all possible alignment hypotheses, with potential to
   facilitate the users to simultaneously rationalize and assess competing
   alignment relationships between protein sequences, beyond simply reporting a
   single (best) alignment. We demonstrate the performance of our program on
   benchmarks containing distantly related protein sequences."
   -- [doi:10.1093/bioinformatics/btz368]['19].
   Also see [protein].

%A L. Allison
%A A. Konagurthu
%A D. Schmidt
%T On universal codes for integers: Wallace Tree, Elias Omega and variations
%J arXiv
%D 2019
%K TR, c2019, c201x, c20xx, zz0619, LAllison, ArunK, integer, universal code,
   codes, compression, Elias, omega, omega2, omega*, omegastar, WTC, II, MML,
   Rissanen, logstar, log*, Fibonacci, asymptotic, optimality, MDL, robustness
%X "A universal code for the (positive) integers can be used to store or
   compress a sequence of integers. Every universal code implies a probability
   distribution on integers. This implied distribution may be a reasonable
   choice when the true distribution of a source of integers is unknown.
   Wallace Tree Code (WTC) is a universal code for integers based on binary
   trees. We give the encoding and decoding routines for WTC and analyse the
   properties of the code in comparison to two well-known codes, the Fibonacci
   and Elias omega codes. Some improvements on the Elias omega code are also
   described and examined."
   -- 1906.05004@[arXiv]['19].
   Also see DCC21 and

%A A. S. Konagurthu
%A R. Subramanian
%A L. Allison
%A D. Abramson
%A M. Garcia de la Banda
%A P. J. Stuckey
%A A. M. Lesk
%T Information-theoretic inference of an optimal dictionary of protein
   supersecondary structures
%B Protein Supersecondary Structures
%I Springer
%P 123-131
%D 2019
%K chapter, c2019, c201x, c20xx, zz0519, LAllison, ArunK, AMLesk,
   bioinformatics, protein, structure, proteins, supersecondary, fold, tableau,
   dictionary, library, MML, minimum message length, mdl, AI, inference
%X "We recently developed an unsupervised Bayesian inference methodology to
   automatically infer a dictionary of protein supersecondary structures
   (Subramanian et al., IEEE data compression conference proceedings (DCC),
   340-349, 2017). Specifically, this methodology uses the information-theoretic
   framework of minimum message length (MML) criterion for hypothesis selection
   (Wallace, Statistical and inductive inference by minimum message length,
   Springer Science & Business Media, New York, 2005). The best dictionary of
   supersecondary structures is the one that yields the most (lossless)
   compression on the source collection of folding patterns represented as
   tableaux (matrix representations that capture the essence of protein folding
   patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines
   our MML methodology for inferring the supersecondary structure dictionary.
   The inferred dictionary is available at [www(click)][/4/2019]."
   -- [doi:10.1007/978-1-4939-9161-7_6]['19].
   (Also see [protein].)

%A L. Allison
%T Coding Ockham's Razor
%I Springer
%D 2018
%K book, text, c2018, c201x, c20xx, zz0618, LAllison, Occam, Ockham razor, MDL,
   minimum message length, description, MML, AIC, BIC, statistics, probability,
   likelihood, inductive inference, Bayesian, artificial intelligence, AI,
   algorithm, decision, tree, Dtree, nml, mmld, kdd, clustering, snob, L4ref,
   mixture model, Gaussian, multistate, multinomial, vonMises, Fisher, Poisson,
   geometric, modelling, iris data, regression, stats, machine learning, L5ref,
   II, AI, 27 club, 27club, musician, graph, motif, pattern, graphs, graphModel,
   information, compression, compress, model, code, entropy, maths, statistics
%X "... inductive inference using the minimum message length (MML) principle and
   a computer. It is accompanied by a library of software to help an
   applications programmer, student or researcher in the fields of data analysis
   or machine learning to write computer programs of this kind. ..."
   -- [doi:10.1007/978-3-319-76433-7].
   uk us isbn:3319764322; uk us isbn13:978-3319764320.
     1 Introduction, 2 Discrete, 3 Integers, 4 Continuous, 5 Function-Models,
     6 Multivariate, 7 Mixture Models, 8 Function Model 2, 9 Vectors,
    10 Linear Regression, 11 Graphs, 12 Bits and Pieces, 13 An Implementation.
   (Also see the [mmlist] inc. doc and source code.)

%A D. Sumanaweera
%A L. Allison
%A A. S. Konagurthu
%T The bits between proteins
%J Data Compression Conference (DCC)
%W Snowbird, Utah, USA
%D 2018
%K conf, DCC, DCC2018, MolBio, c2018, c201x, c20xx, zz0418, bioinformatics,
   LAllison, ArunK, Dinithi, DCC, DCC2018, protein, sequence, alignment, DPA,
   algorithm, evolutionary, model, minimum message length, MDL, MML, AIC,
   homology, information, similarity
%X "Comparison of protein sequences via alignment is an important routine in
   modern biological studies. Although the technologies for aligning proteins
   are mature, the current state of the art continues to be plagued by many
   shortcomings, chiefly due to the reliance on: (i) naive objective functions,
   (ii) fixed substitution scores independent of the sequences being considered,
   (iii) arbitrary choices for gap costs, and (iv) reporting, often, one
   optimal alignment without a way to recognise other competing sequence
   alignments. Here, we address these shortcomings by applying the
   compression-based Minimum Message Length (MML) inference framework to the
   protein sequence alignment problem. This grounds the problem in statistical
   learning theory, handles directly the complexity-vs-fit trade-off without
   ad hoc gap costs, allows unsupervised inference of all the statistical
   parameters, and permits the visualization and exploration of competing
   sequence alignment landscape."
   -- [more],
   (Also see [protein].)

%A A. M. Lesk
%A R. Subramanian
%A L. Allison
%A D. Abramson
%A P. J. Stuckey
%A M. G. de la Banda
%A K. S. Konagurthu
%T Universal architectural concepts underlying protein folding patterns
%J bioRxiv
%N 480194
%D 2018
%K TR, MolBio, c2018, c201x, c20xx, zz1120, AMLesk, LAllison, ArunK, protein,
   3D, structure, procodic, prosodic, fold, folding, concept, concepts,
   motif, pattern, substructure
%X "What is the architectural 'basis set' of the observed universe of protein
   structures? Using information-theoretic inference, we answer this question
   with a comprehensive dictionary of 1,493 substructural concepts. Each concept
   represents a topologically-conserved assembly of helices and strands that
   make contact. Any p.structure can be dissected into instances of concepts
   from this dictionary. We dissected the world-wide protein data bank ..."
   -- 480194@[bioRxiv]['18].

%A R. Subramanian
%A L. Allison
%A P. J. Stuckey
%A M. Garcia de la Banda
%A D. Abramson
%A A. M. Lesk
%A A. S. Konagurthu
%T Statistical compression of protein folding patterns for inference of
   recurrent substructural themes
%J Data Compression Conf. (DCC)
%W Snowbird, Utah, USA
%P 340-349
%D 2017
%K conf, DCC, MolBio, c2017, c201x, c20xx, zz0517, protein, tertiary, 3D,
   super secondary, structure, motif, motifs, pattern, blocks, library,
   discovery, description, aic, MDL, minimum message length, MML,
   LAllison, AMLesk, ArunK
%X "Computational analyses of the growing corpus of three-dimensional (3D)
   structures of proteins have revealed a limited set of recurrent substructural
   themes, termed super-secondary structures. Knowledge of super-secondary
   structures is important for the study of protein evolution and for the
   modeling of proteins with unknown structures. Characterizing a comprehensive
   dictionary of these super-secondary structures has been an unanswered
   computational challenge in protein structural studies. This paper presents an
   unsupervised method for learning such a comprehensive dictionary using the
   statistical framework of lossless compression on a database comprised of
   concise geometric representations of protein 3D folding patterns. The best
   dictionary is defined as the one that yields the most compression of the
   database. Here we describe the inference methodology and the statistical
   models used to estimate the encoding lengths. An interactive website for this
   dictionary is available ..."
   -- [more]
   &  [doi:10.1109/DCC.2017.46]['17].
   (Also see [protein].)

%A J. H. Collier
%A L. Allison
%A A. M. Lesk
%A P. J. Stuckey
%A M. Garcia de la Banda
%A A. S. Konagurthu
%T Statistical inference of protein structural alignments using information and
%J J. Bioinformatics
%V 33
%N 1
%P 1005-1013
%D 2017
%O bioRxiv, June 2016
%K jrnl, OUP, MolBio, c2017, c201x, c20xx, zz0317, protein, alignment,
   tertiary structure, 3D, information, MML, MMLigner, software,
   JHCollier, ArunK, LAllison, AMLesk, AIC, bic, mdl
%X "... present here a statistical framework for the precise inference of
   structural alignments, built on the Bayesian and information-theoretic
   principle of Minimum Message Length (MML). The quality of any alignment is
   measured by its explanatory power—the amount of lossless compression achieved
   to explain the protein coordinates using that alignment. ..."
   -- [doi:10.1093/bioinformatics/btw757][2017] (online January 2017),
   (Also see MMLigner@[LCB][2016].)

%A M. Duc Cao
%A L. Allison
%A T. I. Dix
%A M. Boden
%T Robust estimation of evolutionary distances with information theory
%J Mol. Biol. Evol.
%V 33
%N 5
%P 1349-1357
%D 2016
%K jrnl, MBE, MolBio, c2016, c201x, c20xx, zz0816, bioinformatics, evolutionary,
   phylogenetic, distance, trees, sequences, information, maximum, parsimony,
   DNA, likelihood, MML, MDL, AIC, Minh Duc Cao, LAllison, TIDix, MBoden
%X "Methods for measuring genetic dist. in phylogenetics are known to be
   sensitive to the evol. model assumed. However, there is a lack of established
   methodology to accommodate the trade-off between incorporating suff. biol.
   reality & avoiding model overfitting. [&] as trad. methods measure dists.
   based on the observed # of substitutions, they tend to underestimate dist.
   between diverged seqs. due to backward & parallel subsns. Various techniques
   were proposed to correct this, but they lack the robustness [v.] sequs. that
   are distantly related & of unequal base freqs.. ... we present a novel
   genetic dist. est. based on inf.theory that overcomes the above two hurdles.
   Instead of examining the obs. # of subsns, [it] ests. genetic dist. using
   Shannon's mutual information. This naturally provides an effective framework
   for balancing model complexity & goodness of fit. Our dist. est. is shown to
   be approx. linear to elapsed time & hence is less sensitive to the divergence
   of seq. data & compositional biased seqs.. Using extensive simulation data,
   we show that our method 1) consistently reconstructs more accurate phylogeny
   topologies than existing methods, 2) is robust in extreme condns such as
   diverged phylogenies, unequal base freqs. data, & heterogeneous mutation
   patterns, & 3) scales well with large phylogenies."  [online Feb 2016.]
   -- [doi:10.1093/molbev/msw019]['16].
   [Also search for: MolBio evolutionary MML].

%A A. S. Konagurthu
%A P. Kasarapu
%A L. Allison
%A J. H. Collier
%A A. M. Arthur
%T On sufficient statistics of least-squares superposition of vector sets
%J J. Comp. Biol.
%V 22
%N 6
%P 487-497
%D 2015
%K jrnl, JCB, MolBio, bioinformatics, ArunK, LAllison, AMLesk, JHCollier,
   c2015, c201x, c20xx, zz0116, 3D, point set, tertiary, protein, structure,
   structural alignment, superposition, match, matching, stats
%X "The problem of superposition of two corr. vector sets by minimizing their
   sum-of-squares error under orthogonal transformation ... can be solved
   exactly using an alg. whose time complexity grows linearly with the # of
   correspondences. ... particularly in studies involving macromolecular
   structs.. ... formally derives a set of suff.stats. for the least-squares
   superposition problem. These s. are additive. This permits a highly efficient
   (const. time) computation of superpositions (& s.stats.) of vector sets that
   are composed from its constituent v.sets under addition or deletion op.,
   where the s.stats. of the constituent sets are already known (that is, [they]
   have been previously superposed). ... a drastic improvement in the run time
   of the methods that commonly superpose v.sets under addition or deletion
   ops., where previously these ops. were carried out ab initio (ignoring the
   s.stats.). ... demonstrate the improvement our work offers in the context of
   protein structural alignment programs that assemble a reliable structural
   alignment from well-fitting (substructural) fragment pairs. A C++ library for
   this task is available online under an open-source license."
   -- [doi:10.1089/cmb.2014.0154]['16].
   (Based on the 2014 RECOMB paper.)

%A P. Kasarapu
%A L. Allison
%T Minimum message length estimation of mixtures of multivariate Gaussian and
   von Mises-Fisher distributions
%J Machine Learning
%V 100
%N 2
%P 333-378
%D 2015
%O ECML / PKDD, Porto, Portugal, 7-11 Sept. 2015
%K jrnl, c2015, c201x, c20xx, zz0415, AI, II, Parthan, LAllison, MDL, AIC,
   MML, minimum message length, mixture modelling, model, clustering, vonMises,
   multivariate, vMF, Fisher, protein, structure, description, classification,
   Bayesian, directional, probability distribution, mmld, Figuerdo
%X "Mixture modelling involves explaining some observed evidence using a
   combination of probability distributions. The crux of the problem is the
   inference of an optimal number of mixture components and their corr.
   parameters. This paper discusses unsupervised learning of mixture models
   using the Bayesian Minimum Message Length (MML) criterion. To demonstrate the
   effectiveness of search & inference of mixture parameters using the proposed
   approach, we select two key probability distributions, each handling
   fundamentally different types of data: the multivariate Gaussian distribution
   to address mixture modelling of data distributed in Euclidean space, & the
   multivariate von Mises-Fisher (vMF) distribution to address mixture modelling
   of directional data distributed on a unit hypersphere. The key contributions
   of this paper, in addition to the general search & inference methodology,
   include the derivation of MML expressions for encoding the data using
   multivariate Gaussian & von Mises-Fisher distributions, & the analytical
   derivation of the MML estimates of the parameters of the two distributions.
   Our approach is tested on simulated & real world data sets. For instance, we
   infer vMF mixtures that concisely explain experimentally determined 3D
   protein conformations, providing an effective null model description of
   protein structures that is central to many inference problems in structural
   bioinformatics. The experimental results demonstrate that the performance of
   our proposed search & inference method along with the encoding schemes
   improve on the state of the art mixture modelling techniques."
   -- [doi:10.1007/s10994-015-5493-0]['15],

%A F. Petitjean
%A L. Allison
%A G. Webb
%T A statistically efficient and scalable method for log-linear analysis of
   high-dimensional data
%W Shenzhen (.cn)
%D 2014
%K proc, conf, ICDM, DM2018, c2014, c201x, c20xx, zz1214, data mining, models,
   graphical model, log linear, chordal graph, chordalysis, MML, AIC, mdl, chi2,
   chi squared, II, AI, maths, stats, decomposable, FPj, LAllison, GWebb, Monash
%X "Log-linear analysis is the primary statistical approach to discovering
   conditional dependencies between the variables of a dataset. A good
   log-linear analysis method requires both high precision & statistical
   efficiency. High precision means that the risk of false discoveries should be
   kept very low. Statistical efficiency means that the method should discover
   actual associations with as few samples as possible. Classical approaches to
   log-linear analysis make use of chi^2 tests to control this balance between
   quality & complexity.  We present an information-theoretic approach to
   log-linear analysis. We show that our approach 1) requires significantly
   fewer samples to discover the true associations than statistical approaches
   - statistical efficiency - 2) controls for the risk of false discoveries as
   well as statistical approaches - high precision - and 3) can perform the
   discovery on datasets with hundreds of variables on a standard desktop
   computer - computational efficiency."
   -- [more],

%A J. Collier
%A L. Allison
%A A. Lesk
%A M. Garcia de La Banda
%A A. Konagurthu
%T A new statistical framework to assess structural alignment quality using
   information compression
%W Strasbourg
%D 2014
%K conf, ECCB 14, MolBio, c2014, c201x, c20xx, zz0914, LAllison, ArunK, AMLesk,
   JHCollier, protein, 3D, similar, structure, alignment, match, MML, MDL, AIC,
   complexity, bioinformatics, 13th Euro, Conf, Comp, Biology, I value, Ivalue
%X "... proposes a new statistical framework to assess structural alignment
   quality and significance based on lossless information compression. This is
   a radical departure from the traditional approach of formulating scoring
   functions. It links the structural alignment problem to the general class of
   statistical inductive inference problems, solved using the
   information-theoretic criterion of minimum message length. Based on this, we
   developed an efficient and reliable measure of structural alignment quality,
   I-value. The performance of I-value is demonstrated in comparison with a
   number of popular scoring functions, on a large collection of competing
   alignments. Our analysis shows that I-value provides a rigorous and reliable
   quantification of structural alignment quality, addressing a major gap in
   the field."
   -- [doi:10.1093/bioinformatics/btu460]['14],

%A L. Allison
%T On the complexity of graphs (networks) by information content, and
   conditional (mutual) information given other graphs
%I Clayton School of the Faculty of Info. Tech., Monash Uni.
%R 2014/277
%P 14
%D 2014
%K c2014, c201x, c20xx, zz0614, LAllison, TR, complex, compress, compression,
   graph, network, model, graphModel, complexity, minimum message length, MML,
   simple, random, description, AI, information content, AIC, MDL, mutual,
   similarity, viagra, cialis, xanax, valium, structure, SMILES, organic, maths,
   chemical compound, MolBio, molecule, probability, likelihood, LAllison
%X '... concerns the information content of a graph, optionally conditional on
   one or more background, "common knowledge" graphs. It describes an algorithm
   to estimate this information content, and includes some examples based on
   chemical compounds.'
   -- [more],
   &  2008.04744@[arXiv][2020].

%A A. S. Konagurthu
%A P. Kasarapu
%A L. Allison
%A J. H. Collier
%A A. M. Lesk
%T On sufficient statistics of least-squares superposition of vector sets
%I SpringerVerlag
%V 8394
%P 144-159
%D 2014
%K conf, RECOMB, MolBio, c2014, c201x, c20xx, zz0514, ArunK, LAllison, AMLesk,
   JHCollier, bioinformatics, RECOMB18, protein, structure, alignment,
   least squares, RMS, error, 3D, match, matching, additive, orthogonal, rigid,
   vector set, Kearsley, algorithm
%X "Superposition by orthogonal transformation of vector sets by minimizing the
   least-squares error is a fundamental task in many areas of science, notably
   in structural molecular biology. Its widespread use for structural analyses
   is facilitated by exact solns of this problem, computable in linear time.
   However, in several of these analyses it is common to invoke this
   superposition routine a very large number of times, often operating (through
   addition or deletion) on previously superposed vector sets. This paper
   derives a set of sufficient statistics for the least-squares orthogonal
   transformation problem. These sufficient statistics are additive. This
   property allows for the superposition parameters (rotation, translation, &
   root mean square deviation) to be computable as constant time updates from
   the statistics of partial solutions. We demonstrate that this results in a
   massive speed up in the computational effort, when compared to the method
   that recomputes superpositions ab initio .  Among others, protein structural
   alignment algorithms stand to benefit from our results."
   -- [doi:10.1007/978-3-319-05269-4_11]['14],

%A A. S. Konagurthu
%A L. Allison
%A D. Abramson
%A P. J. Stuckey
%A A. M. Lesk
%T How precise are reported protein coordinate data?
%J Acta Cryst.
%V D70
%N 3
%P 904-906
%D 2014
%K jrnl, MolBio, c2014, c201x, c20xx, zz0314, protein, 3D, tertiary, structure,
   precision, accuracy, PDB, ArunK, LAllison, AMLesk
%X "Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally
   reported to greater precision than the experimental structure determinations
   have actually achieved. By using information theory & data compression to
   study the compressibility of protein atomic coordinates, it is possible to
   quantify the amount of randomness in the coordinate data & thereby to
   determine the realistic precision of the reported coordinates. On avg., the
   value of each C_alpha coordinate in a set of selected p.structures solved at
   a variety of resolutions is good to about 0.1A."
   -- [doi:10.1107/S1399004713031787]['14],

%A A. S. Konagurthu
%A A. M. Lesk
%A D. Abramson
%A P. J. Stuckey
%A L. Allison
%T Statistical inference of protein "LEGO bricks"
%D 2013
%K conf, ICDM, ICDM13, MolBio, c2013, c201x, c20xx, zz1213, LAllison, ArunK,
   AMLesk, protein, tertiary, 3D, MML, structure, structures, recurrent,
   structural, motifs, backbone, folds, MDL, library, dictionary, fragment,
   blocks, fragments, bioinformatics, data mining
%X "Proteins are biomolecules of life. They fold into a great variety of
   three-dimensional (3D) shapes. Underlying these folding patterns are many
   recurrent structural fragments or building blocks (analogous to "LEGO(r)
   bricks"). This paper reports an innovative statistical inference approach to
   discover a comprehensive dictionary of protein structural building blocks
   from a large corpus of experimentally determined protein structures. Our
   approach is built on the Bayesian and information-theoretic criterion of
   minimum message length [MML]. To the best of our knowledge, this work is the
   first systematic and rigorous treatment of a very important data mining
   problem that arises in the cross-disciplinary area of structural
   bioinformatics. The quality of the dictionary we find is demonstrated by its
   explanatory power - any protein within the corpus of known 3D structures can
   be dissected into successive regions assigned to fragments from this
   dictionary. This induces a novel one-dimensional representation of three-
   -dimensional protein folding patterns, suitable for application of the rich
   repertoire of character-string processing algorithms, for rapid
   identification of folding patterns of newly determined structures. This paper
   presents the details of the methodology used to infer the dictionary of
   building blocks, and is supported by illustrative examples to demonstrate its
   effectiveness and utility."
   -- [doi:10.1109/ICDM.2013.73]['14],
      [more], and

%A A. S. Konagurthu
%A A. M. Lesk
%A L. Allison
%T Minimum message length inference of secondary structure from protein
   coordinate data
%J J. Bioinformatics
%V 28
%N 12
%P i97-i105
%D 2012
%O ISMB, Long Beach
%K conf, ISMB12, MolBio, c2012, c201x, c20xx, zz0612, LAllison, ArunK, AMLesk,
   SST, bioinformatics, protein, secondary structure, DSSP, assignment, helix,
   extended strand, sheet, coil, mmld, fold, MML, MDL, model
%X "Motivation: Secondary structure underpins the folding pattern and
   architecture of most proteins. Accurate assignment of the SS elts is
   therefore an important problem. Although many approx. solns of the SS
   assignment problem exist, the statement of the problem has resisted a
   consistent & math. rigorous defn. A variety of comparative studies have
   highlighted major disagreements in the way the available methods define &
   assign SS to coord.data.
   Results: We report a new method to infer SS based on the Bayesian method of
   Minimum Message Length (MML) inference. It treats assignments of SS as
   hypotheses that explain the given coord.data. The method seeks to maximise
   the joint probability of a hypothesis & the data. There is a natural null
   hypothesis & any assignment that cannot better it is unacceptable. We
   developed a program SST based on this approach & compared it to popular
   programs such as DSSP & STRIDE amongst others. Our evaln suggests that SST
   gives reliable assignments even on low resolution structures."
   -- [doi:10.1093/bioinformatics/bts223]['12].
   More: [www]['12].

%A E. Makalic
%A L. Allison
%T MMLD inference of multilayer perceptrons
%J Solomonoff 85th Memorial Conference
%W Melbourne
%I SpringerVerlag
%V 7070
%P 261-272
%D 2011
%K conf, Ray, c2011, c201x, c20xx, AI, II, Enes, enesm, MML, mmld, LAllison,
   perceptron, NN, ANN, neural network, information, MDL, complexity
%X "A multilayer perceptron comprising a single hidden layer of neurons with
   sigmoidal transfer fns can approx. any computable fn to arbitrary accuracy.
   The size of the hidden layer dictates the approximation capability of the
   multilayer perceptron & automatically determining a suitable network size for
   a given data set is an interesting question. ... considers the problem of
   inferring the size of multilayer perceptron networks with the MMLD model
   selection criterion which is based on the MML principle. The two main
   contributions of the paper are: (1) a new model seln criterion for inference
   of fully-connected multilayer perceptrons in regression problems, & (2) an
   efficient alg. for computing MMLD-type codelengths in mathematically
   challenging model classes. Empirical performance of the new alg. is
   demonstrated on artificially generated & real data sets."
   -- [doi:10.1007/978-3-642-44958-1_20]['14]; uk us isbn13:9783642449574.
   (Also see [MML].)

%A A. S. Konagurthu
%A L. Allison
%A P. J. Stuckey
%A A. M. Lesk
%T Piecewise linear approximation of protein structures using the principle of
   minimum message length
%J J. Bioinformatics
%V 27
%N 13
%P i43-i51
%D 2011
%K conf, MolBio, MML, c2011, c201x, c20xx, zz0711, ISMB, LAllison, ArunK,
   AMLesk, protein, fold, cartoon, description, ribbon diagram, structure,
   segmentation, minimum message length, MDL, information theoretic
%X "Simple & concise representations of protein-folding patterns provide
   powerful abstractions for visualizations, comparisons, classifications,
   searching & aligning structural data. Structures are often abstracted by
   replacing standard secondary structural features - that is, helices & strands
   of sheet - by vectors or linear segments. Relying solely on std secondary
   structure may result in a sig. loss of structural information. Further,
   traditional methods of simplification crucially depend on the consistency &
   accuracy of external methods to assign SS to protein coord.data. Although
   many methods exist automatically to identify SS, the impreciseness of
   definitions, along with errors & inconsistencies in experimental structure
   data, drastically limit their applicability to generate reliable simplified
   representations, especially for structural comparison.
   This article introduces a mathematically rigorous alg. to delineate protein
   structure using the elegant statistical & inductive inference framework of
   minimum message length (MML). Our method generates consistent & statistically
   robust piecewise linear explanations of protein coordinate data, resulting in
   a powerful & concise representation of the structure. The delineation is
   completely independent of the approaches of using hydrogen-bonding patterns
   or inspecting local substructural geometry that the current methods use.
   Indeed, as is common with applications of the MML criterion, this method is
   free of parameters & thresholds, in striking contrast to the existing
   programs which are often beset by them.
   The analysis of results over a large number of proteins suggests that the
   method produces consistent delineation of structures that encompasses, among
   others, the segments corresponding to standard secondary structure."
   -- [doi:10.1093/bioinformatics/btr240]['11].
   (Also see [more].)

%A M. D. Cao
%A T. I. Dix
%A L. Allison
%T A biological compression model and its applications
%B Software Tools and Algorithms for Biological Systems
%E H. R. Arabnia
%E Q.-N. Tran
%I SpringerVerlag
%S AEMB (Advances in Experimental Medicine and Biology)
%V 696
%P 657-666
%D 2011
%K MolBio, AEMB, book chapter, c2011, c201x, zz0411, c20xx, LAllison, TIDix,
   Minh Duc Cao, bioinformatics, entropy, compression, DNA, protein, sequence,
   data, compress, MML, MDL
%X "A biological compression model, expert model, is presented which is superior
   to existing compression algorithms in both compression performance and speed.
   The model is able to compress whole eukaryotic genomes. Most importantly, the
   model provides a framework for knowledge discovery from biological data. It
   can be used for repeat element discovery, sequence alignment and phylogenetic
   analysis. We demonstrate that the model can handle statistically biased
   sequences and distantly related sequences where conventional knowledge
   discovery tools often fail."
   -- [more],
      chapter 67 [doi:10.1007/978-1-4419-7046-6_67]['11],
      in: hb us$204; uk us isbn:1441970452; uk us isbn13:978-1441970459.

%A M. D. Cao
%A T. I. Dix
%A L. Allison
%T A genome alignment algorithm based on compression
%J BMC Bioinformatics
%V 11
%N 1
%P 599
%D 2010
%K jrnl, eJrnl, MolBio, c2010, c201x, c20xx, Minh Duc Cao, LAllison, TIDix,
   highly accessed paper, DNA, whole genome alignment, zz0111, local alignment,
   sequence, long, compress, compression, MML, MDL, BIC, poa, XM, expert model,
%X "... Since genomic sequences carry genetic info., this article proposes that
   the info. content of each nucleotide in a posn should be considered in
   sequence alignment. An info.-theoretic approach for pairwise genome local
   alignment, namely XMAligner, is presented. Instead of comparing sequences at
   the character level, XMAligner considers a pair of nucleotides from two
   seqs. to be related if their mutual info. in context is significant. The
   info.content of nucleotides in sequences is measured by a lossless
   compression technique. ... Experiments on both simulated data & real data
   show that XMAligner is superior to conventional methods especially on
   distantly related seqs. & statistically biased data. XMAligner can align
   seqs. of eukaryote genome size with only a modest hardware requirement. ..."
   -- [more],
      [doi:10.1186/1471-2105-11-599][12/'10] (BMC 17/2/12 mail: 3372 accesses),

%A M. B. Dale
%A L. Allison
%A P. E. R. Dale
%T Model selection using Minimal Message Length: an example using pollen data
%J Community Ecology
%V 11
%N 2
%P 187-201
%D 2010
%K jrnl, c2010, c201x, c20xx, LAllison, MML, mdl, pollen, data, segmentation,
   model, clustering, time series, timeSeries, plant, ecology, vegetation,
   cores, sediment
%X "... use of the minimum message length criterion [for] evaluating alternative
   models of data when the samples are serially ordered in space & implicitly
   in time. Much data from vegetation studies can be arranged in a sequence &
   in such cases the user may elect to constrain the clustering by zones, in
   preference to an unconstrained clustering. We use the MML principle to
   determine if such a choice provides an effective model of the data. Pollen
   data provide a suitably organised set of samples, but have other properties
   which make it desirable to examine several different models for the
   distribution of palynomorphs within the clusters. The results suggest that
   zonation is not a particularly preferred model since it captures only a small
   part of the patterns present. It represents a user expectation regarding the
   nature of variation in the data & results in some patterns being neglected.
   By using unconstrained clustering within zones, we can recover some of this
   overlooked pattern. We then examine other evidence for the nature of change
   in vegetation & finally discuss the usefulness of the MML as a guiding
   principle in model choice & its relationship to other possible criteria."
   -- [doi:10.1556/ComEc.11.2010.2.7]['11].
   (Also see [IP].)

%A A. Konagurthu
%A L. Allison
%A T. Conway
%A B. Beresford-Smith
%A J. Zobel
%T Design of an efficient out-of-core read alignment algorithm
%I SpringerVerlag
%V 6293
%P 189-201
%D 2010
%K wShop, MolBio, c2010, c201x, c20xx, zz1010, WABI, WABI10, nicta, LAllison,
   Syzygy, ArunK, Algorithms in Bioinformatics, short read, reads, NGS, align,
   mapping, next generation, DNA, sequencing, algorithm
%X "New genome sequencing technologies are poised to enter the sequencing
   landscape with significantly higher throughput of read data produced at
   unprecedented speeds & lower costs per run. However, current in-memory
   methods to align a set of reads to one or more reference genomes are
   ill-equipped to handle the expected growth of read-throughput from newer
   technologies.  ... reports the design of a new out-of-core read mapping alg.,
   Syzygy, which can scale to large volumes of read & genome data. The alg. is
   designed to run in a constant, user-stipulated amount of main memory -
   small enough to fit on standard desktops - irrespective of the sizes of read
   & genome data. Syzygy achieves a superior spatial locality-of-reference that
   allows all large data structures used in the alg. to be maintained on disk.
   We compare our prototype implementation with several popular read alignment
   programs.  Our results demonstrate clearly that Syzygy can scale to very
   large read volumes while using only a fraction of memory in comparison,
   without sacrificing performance."
   -- [more],
   (In: uk us isbn:3642152937; uk us isbn13:978-3-642-15293-1.)

%A M. B. Dale
%A L. Allison
%A P. E. R. Dale
%T A model for correlation within clusters and its use in pollen analysis
%J Community Ecology
%V 11
%N 1
%P 51-58
%D 2010
%K jrnl, c2010, c201x, c20xx, LAllison, MML, mdl, pollen, data, plants, ecology,
   model, segmentation, clustering, sediments, time
%X "... determine the # of clusters & also whether the model of any cluster is
   improved by introducing a factor. ... allows us to seek clusters which
   reflect directional changes rather than imposing a zonation constrained by
   spatial (& implicitly temporal) position. Minimal message length is a means
   of utilising Okham's Razor in inductive analysis. The "best" model is that
   which allows most compression of the data, which results in a minimal
   message length for the description. Fit to the data is not a sufficient
   criterion for choosing models because more complicated models will almost
   always fit better. MML combines fit to the data with an encoding of the model
   & provides a Bayesian probability criterion as a means of choosing between
   models (& classes of model). Applying the analysis to a pollen diagram from
   Southern Chile, we find that the introduction of factors does not improve the
   overall quality of the mixture model. The solution without axes in any
   cluster provides the most parsimonious solution. Examining the cluster with
   the best case for a factor to be incorporated in its description shows that
   the attributes highly loaded on the axis represent a contrast of herbaceous
   vegetation & dominant forests types. This contrast is also found when fitting
   the entire population, & in this case the factor solution is the preferred
   model. Overall, the cluster solution without factors is much preferred. Thus,
   in this case classification is preferred to ordination although more data are
   desirable to confirm such a conclusion."
   -- [doi:10.1556/ComEc.11.2010.1.8][11].
   (Also see [IP].)

%A M. D. Cao
%A L. Allison
%A T. I. Dix
%T A distance measure for genome phylogenetic analysis
%J AI09
%I SpringerVerlag
%V 5866/2009
%P 71-80
%D 2009
%K conf, MolBio, c2009, c200x, c20xx, zz1209, Minh Duc Cao, TIDix, AI, II, AI09,
   LAllison, NICTA, bioinformatics, MML, MDL, phylogenetic, tree, evolutionary,
   XM, trees, distance, measure, matrix, information, maximum parsimony,
   maximum likelihood, method, methods, construction, reconstruction, inference,
   Plasmodium, Plasmodia
%X "Phylogenetic analyses of species based on single genes or parts of the
   genomes are often inconsistent because of factors such as variable rates of
   evolution & horizontal gene transfer.  ... phylogeny construction from
   complete genomes that is less sensitive to such inconsistency. For such long
   seqs., construction methods like maximum parsimony & maximum likelihood are
   often not possible due to their intensive computational requirement. Another
   class of tree construction methods, namely distance-based methods, require a
   measure of distances between any 2 genomes. ...  presents an information
   theoretic measure of genetic distances between genomes based on the
   biological compression algorithm 'expert model' [XM]. ...  tree of a number
   of Plasmodium parasites from their genomes ... successfully construct a
   plausible evolutionary tree for the gamma-Proteobacteria group whose genomes
   are known to contain many horizontally transferred genes."
   -- [doi:10.1007/978-3-642-10439-8_8]['09],
   &  [phylogenetics].
   (2009; uk us isbn:364210438X; uk us isbn13:978-3642104381).

%A J. Bernal
%A T. I. Dix
%A L. Allison
%A A. Bartholomeusz
%A L. Yuen
%T Modelling Hepatitis B virus antiviral therapy and drug resistant mutant
%I SpringerVerlag
%V 5865
%P 159-168
%D 2009
%K conf, ACAL, MolBio, c2009, c200x, c20xx, zz1109, TIDix, LAllison, NICTA,
   vidrl, Hepatitis B, Hep B, HepB, HBV, bioinformatics, drug resistance, model,
%X "... an individual-based model of HBV inside an artificial liver, &
   associated blood serum, undergoing antiviral therapy. This model aims to
   provide insights into the evolution of the HBV quasispecies & the
   individual contribution of HBV mutations in the outcome of therapy."
   -- [doi:10.1007/978-3-642-10427-5_16]['09].
   (2009; uk us isbn:3642104266; uk us isbn13:978-3642104268.)
   (Also see [Bioinformatics].)

%A M. D. Cao
%A T. I. Dix
%A L. Allison
%T Computing substitution matrices for genomic comparative analysis
%J Advances in Knowledge Discovery and Data Mining
%I SpringerVerlag
%V 5476 / 2009
%P 647-655
%D 2009
%K conf, PAKDD, MolBio, c2009, c200x, c20xx, zz0509, bioinformatics, PAKDD09,
   PAKDD13, DNA, genome, comparative analysis, comparison, substitution matrix,
   PAM, blosum, estimation, alignment, free, homology, MML, MDL, information,
   XM, Minh Duc Cao, TIDix, LAllison, (NICTA), malaria genome
%X "Substitution matrices ... and are important for many knowledge discovery
   tasks such as phylogenetic analysis and sequence alignment. ... present a
   novel algorithm that addresses this by computing a nucleotide substitution
   matrix specifically for the two genomes being aligned. ... uses compression
   ...  reconstructs, with high accuracy, the substitution matrix for
   synthesised data generated from a known matrix with introduced noise. ...
   successfully applied to real data for various malaria parasite genomes,
   which have differing phylogenetic distances and composition that lessens the
   effectiveness of standard statistical analysis techniques."
   pdf@[doi:10.1007/978-3-642-01307-2_64]['09], and
   [substitution matrices].

%A M. D. Cao
%A T. I Dix
%A L. Allison
%T A genome alignment algorithm based on compression
%R 2009/233
%I Faculty of Info. Tech. (Clayton), Monash University
%D 2009
%K TR, TR233, MolBio, c2009, c200x, c20xx, bioinformatics, zz0109, FIT, Monash,
   Minh Duc Cao, TIDix, LAllison, data, DNA, sequence, compression, compressed,
   compress, MML, MDL, align, strings, information, content, free, XM, XMAligner
%X "Traditional genome alignment methods based on dynamic programming are often
   a. computational expensive, b. unable to compare the genomes of distant
   species, & c. unable to deal with low information regions. ... information-
   -theoretic approach for pairwise genome local alignment. ... the expert model
   aligner, the XMAligner, relies on the expert model compression alg.. To align
   2 seqs., XMAligner 1st compresses one sequence to measure the info. content
   at each posn in the seq.. Then the seq. is compressed again but this time
   with the background knowledge from the other seq. to obtain the conditional
   info. content. The info. content & the conditional info. content from the
   2 compressions are examined. Similar regions in the compressed seq. should
   have the conditional info. content lower than the individual info. content.
   ... applied to align the genomes of Plasmodium falciparum & P. knowlesi v.
   other 3 P. genomes with different levels of diversity. Despite the
   differences in nucleotide composition of the reference seqs., the conserved
   regions found by XMAligner in 3 alignments are relatively consistent. A
   strong correlation was found between the similar regions detected by the
   XMAligner & the hypothetical annotation of Plasmodium species. The alignment
   results can be integrated into the DNAPlatform for visualisation."
   -- [abs]['09].
   [Also search for: Cao Dix Allison Bioinformatics c2010],
   and see [compression]['09].

%A L. Allison
%T Added distributions for use in clustering (mixture modelling), function
   models, regression trees, segmentation, and mixed Bayesian networks in
   Inductive Programming 1.2
%R 2008/224
%I Faculty of Info. Tech., Monash University, Australia 3800
%D 2008
%K IP 1.2, IP1.2, TR 224, TR224, c2008, c200x, c20xx, zz0508, LAllison,
   machine learning, Geometric, Poisson, Gaussian, probability, stats, model,
   tree, segment, network, FP, MDL, minimum message length, MML, description,
   inductive inference, II, Student's t-Distribution, tDistribution
%X "Inductive programming is a machine learning paradigm combining functional
   programming (FP) with the information theoretic criterion, Minimum Message
   Length (MML). IP 1.2 now includes the Geometric and Poisson distributions
   over non-negative integers, and Student's t-Distribution over continuous
   values, as well as the Multinomial and Normal (Gaussian) distributions from
   before. All of these can be used with IP's model-transformation operators,
   and structure-learning algorithms including clustering (mixture-models),
   classification- (decision-) trees and other regressions, and mixed Bayesian
   networks, provided only that the types match between each corresponding
   component Model, transformation, structured model, and variable -- discrete,
   continuous, sequence, multivariate, and so on."
   -- [Ind.Prog.(IP)] inc. source code,

%A T. I. Dix
%A D. R. Powell
%A L. Allison
%A J. Bernal
%A S. Jaeger
%A L. Stern
%T Comparative analysis of long DNA sequences by per element
   information content using different contexts
%J BMC Bioinformatics
%V 8(S2):S10
%D 2007
%K jrnl, ejrnl, MolBio, c2007, c200x, c20xx, zz0407, TIDix, LAllison, sequence,
   analysis, data, compression, compress, low, mutual, information content,
   simple, complex, DNA, plot, II, MDL, AI, context, description, profile,
   signature, pattern discovery, genomic, minimum message length, MML,
   Cyanidioschyzon merolae, alga, algae, Plasmodium falciparum, gui, browser
%X "... The paper presents a methodology for exploring long DNA sequences, of
   the order of millions of bases, by means of their information content. ..."
   -- [long DNA],

%A M. D. Cao
%A T. I. Dix
%A L. Allison
%A C. Mears
%T A simple statistical algorithm for biological sequence compression
%J Data Compression Conf.
%W Snowbird, Utah
%P 43-52
%D 2007
%K conf, MolBio, II, DCC, DCC07, c2007, c200x, c20xx, zz0307, DNA, protein,
   minh, minhc, Minh Duc Cao, LAllison, TIDix, MML, MDL, compress, entropy,
   information theory, expert model, XM, profile, models, bioinformatics,
   dnacompress, pattern discovery, protein, compressible
%X "This paper introduces a novel algorithm for biological sequence compression
   that makes use of both statistical properties & repetition within sequences.
   A panel of experts is maintained to estimate the probability distribution of
   the next symbol in the sequence to be encoded. Expert probabilities are
   combined to obtain the final distribution. The resulting information sequence
   provides insight for further study of the biological sequence. Each symbol is
   then encoded by arithmetic coding. Experiments show that our algorithm
   outperforms existing compressors on typical DNA & protein sequence datasets
   while maintaining a practical running time."
   -- [doi:10.1109/DCC.2007.7]['07].

%A R. T. O'Donnell
%A K. B. Korb
%A L. Allison
%T Causal KL: Evaluating causal discovery
%I Faculty of Info. Tech., Monash University, Clayton, Australia 3800
%R 2007/207
%P 26
%D 2007
%O arXiv, Nov., 2021, 2111.06029@[arXiv][2021]
%K TR 207, TR207, c2007, c200x, c20xx, zz0522, LAllison, rodo, KBKorb, AI, II,
   causal model, stats, intervention, interventions, Bayesian network, networks,
   BN, models, cause, Causal Kullback Leibler distance, KL, CKL, C-KL,
   Bnet, Bnets
%X "The two most commonly used criteria for assessing causal model discovery
   with artificial data are edit-distance & Kullback-Leibler divergence,
   measured from the true model to the learned model. Both of these metrics
   maximally reward the true model. However, we argue that they are both
   insufficiently discriminating in judging the relative merits of false models.
   Edit distance, for example, fails to distinguish between strong & weak
   probabilistic dependencies. KL divergence, on the other hand, rewards
   equally all statistically equivalent models, regardless of their different
   causal claims. We propose an augmented KL divergence, which we call Causal
   KL (CKL), which takes into account causal relationships which distinguish
   between observationally equivalent models. Results are presented for three
   variants of CKL, showing that Causal KL works well in practice."
   -- cached@[c'teseer]['21],
   Also see [MML].

%A M. B. Dale
%A L. Allison
%A P. E. R. Dale
%T Segmentation and clustering as complementary sources of information
%J Acta Oecologica
%I Elsevier
%V 31
%N 2
%P 193-202
%D 2007
%K jrnl, ecology, LAllison, c2007, c200x, c20xx, segment, cluster, MML, MDL,
   II, minimum message length, description, classification, mixture model,
   modelling, BIC, ecological, vegetation, spatial, series, scale,  zz0307
%X  "... examines the effects of using a
   segmentation method to identify change-points or edges in vegetation. It
   identifies coherence (spatial or temporal) in place of unconstrained
   clustering. ..."
   Auths 1 & 3 @ Griffith U., auth 2 @ Monash U.
   Also see [MML].

%A R. T. O'Donnell
%A L. Allison
%A K. B. Korb
%T Learning hybrid Bayesian networks by MML
%J Advances in Artificial Intelligence
%I SpringerVerlag
%V 4304
%P 192-203
%D 2006
%O (19th ACS Australian Joint Conf. on Artificial Intelligence, AI06)
%K conf, c2006, c200x, c20xx, AI06, minimum message, MDL, rodo, description,
   LAllison, KBKorb, MML, length, AI, II, classification, CaMML,
   Markov Chain Monte Carlo, MCMC, graphical, BDE, graph, network, zz1206,
   local structure, CPT, decision tree, trees, Dtree, logit, logistic,
   stats, model, models, Bnet, Bnets, net, nets
%X "We use a Markov Chain Monte Carlo (MCMC) MML algorithm to learn hybrid
   Bayesian networks from observational data. Hybrid networks represent local
   structure, using conditional probability tables (CPT), logit models,
   decision trees or hybrid models, i.e., combinations of the three. We compare
   this method with alternative local structure learning algorithms using the
   MDL and BDe metrics. Results are presented for both real and artificial data
   sets. Hybrid models compare favourably to other local structure learners,
   allowing simple representations given limited data combined with richer
   representations given massive data". isbn:3540497870; isbn13:978-3540497875.
   -- [Bayes nets],
   (Also see [MML].)

%A T. I. Dix
%A D. R. Powell
%A L. Allison
%A S. Jaeger
%A J. Bernal
%A L. Stern
%T Exploring long DNA sequences by information content
%I Probabilistic Modeling and Machine Learning in Structural and
   Systems Biology, Workshop Proc.
%W Tuusula, Finland
%P 97-102
%D 2006
%K MolBio, wShop, c2006, c200x, c20xx, modelling, models, sequence, MML, MDL,
   LAllison, minimum message length, bioinformatics, comparative, chromosome,
   description, 1D, TIDix, zz0806, approximate repeats, model, ARM, dotter,
   dot plot, plots, 2D
%X  ... long ... order of millions of bases, ...
   Also see Dix et al, BMC Bioinf., 8(S2):S10, May '07,
   and [Bioinformatics].
   CR classn '98: F.2, I.2.6, G.3, J.3.; isbn:9521032774.

%A L. Allison
%T A programming paradigm for machine learning with a case study of
   Bayesian networks
%J ACSC2006
%P 103-111
%D 2006
%K conf, ACSC, ACSW, LAllison, Inductive Programming, IP, c2006, c200x, c20xx,
   MML FP, MMLF P, MMLFP, AI, minimum message length, description, MDL, zz0106,
   inference, network, net, nets, mixed, generalized, SAR, search and rescue,
   SARbayes, general, models, stats, model, trees, II, missing data, Bnet, Bnets
%X "... combines functional programming for writing statistical models &
   information theory to prevent overfitting. ... Inductive programming is
   illustrated by a case study of Bayesian networks. Networks are built from
   classification- (decision-) trees. Trees are built from partitioning
   functions & models on data-spaces. Trees, & hence networks, are general as a
   natural consequence of the method. Discrete & continuous variables, &
   missing values are handled by the networks. Finally the Bayesian networks
   are applied to a challenging data set on lost persons."
   -- [Bayesian networks],
      [Bayesian networks];  also see TR153 c2004,
      [paper.pdf] isbn:1920682309,
   (Also see [MML].)

%A L. Allison
%T Inductive inference 1.1.2:  Inductive programming and a case study of
   Bayesian networks
%I Faculty of Info. Tech., Monash University, Clayton, Australia 3800
%R 2005/177
%P 18
%D 2005
%K TR 177, TR177, network, c2005, c200x, c20xx, zz1105, minimum message length,
   MML, LAllison, description, machine learning, AI, II, csse, lost person,
   FP, MMLFP, MMLF P, missing data, mixed, net, nets, tree, trees, dTree, mdl,
   generalized, missing data, TR153, search and rescue, SAR, Bnet, Bnets
%X Builds on the mixed Bayesian nets introduced in TR2004/153.
   Also see [Ind.Inf.],
            [Ind.Inf.] pages.

%A L. Allison
%T Models for machine learning and data mining in functional programming
%J J. Functional Programming
%V 15
%N 1
%P 15-32
%D 2005
%O online at JFP 23/7/2004
%K jrnl, JFP, LAllison, FP, inductive inference, II, AI, c2005, c200x, c20xx,
   Haskell, unsupervised, supervised, classification, mixture modelling, stats,
   decision tree, dTree, cTree, model trees, minimum message length, MML, type,
   zz0105, description, probability, statistical model, MDL, mmld, MMLFP,
   MMLF P, Bayesian
%X  "... Haskell & its type system are used to define & analyse the
   nature of some problems & tools in machine learning & data mining. Data
   types & type-classes for statistical models are developed that allow models
   to be manipulated in a precise, type-safe & flexible way. [SMs] considered
   inc. prob. distributions, mixture models, function-models, time-series, &
   classification- & function-model-trees. The aim is to improve ways of
   designing & programming with models, not only of applying them."
   -- [doi:10.1017/S0956796804005301][23/7/2004].
   Also see [IP],
            [II] & [JFP] and

%A L. Allison
%T Finding approximate palindromes in strings quickly and simply
%I School of Computer Science and Software Engineering, Monash University,
   Australia 3800
%R 2004/162
%D 2004
%K TR 162, TR162, palindrome, MolBio, bioinformatics, algorithm, string,
   LAllison, c2004, c200x, c20xx, zz1204
%X Described are two algorithms to find long approximate palindromes in
   a string, for example a DNA sequence.  A simple algorithm requires
   O(n)-space and almost always runs in O(k.n)-time where n is the
   length of the string and k is the number of ``errors'' allowed in the
   palindrome.  Its worst-case time-complexity is O(n^2) but this does
   not occur with real biological sequences.  A more complex algorithm
   guarantees O(k.n) worst-case time complexity.
   -- [arxiv...pdf][12/'04],

%A D. R. Powell
%A L. Allison
%A T. I. Dix
%T Modelling alignment for non-random sequences
%J Advances in Artificial Intelligence
%I SpringerVerlag
%V 3339
%P 203-214
%D 2004
%O 17th ACS Australian Joint Conf. on Artificial Intelligence (AI2004)
%K conf, AI, MolBio, dynamic programming algorithm, DPA, c2004, c200x, c20xx,
   similar, sequence, homology, Malignment, minimum message length, MML, MDL,
   PRSS, FASTA, BLAST, Smith Waterman, context, significance test, P values,
   Pvalue, biased, DNA, bias, plasmodium falciparum, low information content,
   score, dependent, pattern, repeats, repetitive, shuffling, masking, scoring,
   quality, Markov, model, models, hidden, HMM, PHMM, HMMER, description, pair,
   strings, DRPowell, LAllison, TIDix, bioinformatics, probabilistic, total,
%X "Populations of biased, non-random seqs. may cause standard alignment
   algorithms to yield false-positive matches & false-negative misses. A std
   significance test based on the shuffling of sequences is a partial solutions
   applicable to pop'ns that can be described by simple models. Masking-out
   low information content intervals throws information away. ... new & general
   method, modelling alignment: Population models are incorporated into the
   alignment process, which can (& should) lead to changes in the rank-order of
   matches between a query seq. & a collection of seqs., compared to results
   from std algorithms. The new method is general & places very few conditions
   on the nature of the models that can be used with it. We apply modelling-
   alignment to local alignment, global alignment, optimal alignment & the
   relatedness problem.    Results: As expected, modelling-alignment & the
   standard PRSS program from the FASTA package have similar accuracy on
   sequence populations that can be described by simple models, e.g. 0-order
   Markov models. However, modelling-alignment has higher accuracy on popns that
   are mixed or that are described by higher-order models: It gives fewer false
   positives & false negatives as show by ROC curves & other results from tests
   on real and artificial data". isbn:3540240594.
   -- [doi:10.1007/978-3-540-30549-1_19]['11],
      [mAlign] inc software.
   [Also search for: Allison COMPJ c1999].

%A E. Makalic
%A L. Allison
%A A. Paplinski
%T MML inference of RBF neural networks for regression
%J Brazilian Symp. on Artificial Neural Networks (SBRN)
%I IEEE & Brazillian Computer Soc.
%W Sao Luis
%P 6
%D 2004
%K conf, ANN, network, c2004, c200x, c20xx, Enes, enesm, MML, zz1104, AI, MDL,
   minimum message length, radial basis function, RBFs, functions, description,
   LAllison, artificial, AIC
%X Also see [MML],
   Conf [sbrn] 29/9-1/10/2004, isbn:8589029042.

%A L. Allison
%T Inductive inference 1.1
%I School of Computer Science and Software Engineering, Monash University,
   Australia 3800
%R 2004/153
%P 16
%D 2004
%K TR 153, TR153, TR148, 148, MML, MDL, MMLFP, MMLF, minimum message length,
   functional programming, FP, description, CSSE, zz0504, LAllison, AI, hybrid,
   Bayesian networks, network, mixture, classification, cluster, clustering,
   sequence, classification decision trees, tree, dTree, mixed, model, II,
   inductive inference, graphical, statistical model, machine learning, general,
   tree, models, net, nets, TR177, 177, c2004, c200x, c20xx, Bnet, Bnets
%X "... Case studies illustrate the [F] style of programming for [II]. ...
    Mixtures of Markov Models, Stateful Time-Series, Bayesian network ..."
   See [IP],
   Also see TR2003/148, TR2005/177 and ACSC2006.

%A L. Allison
%T Inductive inference 1
%I School of Computer Science and Software Engineering, Monash University,
   Australia 3800
%R 2003/148
%D 2003
%K TR148, TR 148, MML, MDL, II, MMLFP, MMLF, minimum, message, length, zz1203,
   functional programming, FP, data, type, class, description, CSSE, LAllison,
   statistical models, Bayesian, artificial intelligence, AI, machine learning
%X See [IP],
   Also see TR2004/153, JFP c2005, and TR2005/177.

%A E. Makalic
%A L. Allison
%A D. L. Dowe
%T MML inference of single-layer neural networks
%J Proc. of the 3rd IASTED Int. Conf. on Artificial Intelligence and
%W Benalmadena, Spain
%P 636-642
%D 2003
%O TR 2003/142, CSSE, Monash University, Australia OCT 2003
%K conf, IASTED AIA, ANN, artificial neural network, minimum message length, II,
   LAllison, NN, MML, MDL, DLD, Enes, enesm, description, c2003, c200x, c20xx,
   Bayesian, Hessian, TR142, TR 142
%X "The architecture selection problem is of great importance when designing
   neural networks. A network that is too simple does not learn the problem
   sufficiently well. Conversely, a larger than necessary network presumably
   indicates overfitting & provides low generalisation performance. This paper
   presents a novel architecture selection criterion for single hidden layer
   feedforward networks. The optimal network size is determined using a version
   of the Minimum Message Length (MML) inference method. Performance is
   demonstrated on several problems & compared with a Minimum Description
   Length (MDL) based selection criterion." isbn:0889863903; issn:14827913.
   Also see: [TR 2003/142.ps] MML inference of single-layer neural networks,
   and [MML].

%A L. Allison
%T Longest biased interval and longest non-negative sum interval
%J J. Bioinformatics
%V 19
%N 10
%P 1294-1295
%D 2003
%K MolBio, jrnl, sequence analysis, DNA, RNA, zz0703, c2003, c200x, c20xx,
   bias, density, skew, constraints, low information content, substring,
   algorithm, program, locating, maximum, maximal, heaviest, longest, segment,
   segments, intervals, regions, positive sum, nonnegative total, AT CG rich,
   ATrich, CGrich, problem, CpG, LBI, CABIOS, biased interval, LAllison,
   average, score, scores, highest
%X ... an algorithm to find the longest interval having at least a specified
   minimum bias in a sequence of characters (bases, amino acids),
   e.g. 'at least 0.95 (A+T)-rich'.  It is based on an algorithm to find
   the longest interval having a non-negative sum in a sequence of positive
   and negative numbers.  In practice, it runs in linear time; this can be
   guaranteed if the bias is rational. ...
   -- [LBI, inc' code],
      [LBI, inc' code].
      [can analyse long sequences, e.g. whole chromosomes, very quickly.]
   -- [doi:10.1093/bioinformatics/btg135]['05],

%A L. J. Fitzgibbon
%A D. L. Dowe
%A L. Allison
%T Bayesian posterior comprehension via message from Monte Carlo
%J 2nd Hawaii Int. Conf. on Statistics and Related Fields (HICS)
%D 2003
%K conf, HICS, HICS2, HICS2003, statistics, Las Vegas, stochastic, MonteCarlo,
   MCMC, posterior probability distribution, LAllison, c2003, c200x, c20xx
%X [Monte Carlo][5/'03],

%A L. Allison
%T The types of models
%J 2nd Hawaii Int. Conf. on Statistics and Related Fields (HICS)
%D 2003
%K HICS, HICS2, c2003, c200x, c20xx, MML, MMLFP, MMLF P, MML F P, FP, MDL,
   conf, stats, type, model, II, AI, LAllison
%X [abs],
   Also see [Ind. Prog.].

%A L. J. Fitzgibbon
%A L. Allison
%A J. W. Comley
%T Probability model type sufficiency
%J Intelligent Data Engineering and Automated Learning
%I Springer
%V 2690
%P 530-534
%D 2003
%O (4th Int. Conf. on Intelligent Data Engineering and Automated Learning,
   IDEAL-2003, Hong Kong)
%K conf, IDEAL, IDEAL4, IDEAL2003, statistical model, sufficient statistics,
   stats, c2003, c200x, c20xx, LAllison
%X "We investigate the role of sufficient statistics in generalized
   probabilistic data mining & machine learning software frameworks. Some issues
   involved in the specification of a statistical model type are discussed & we
   show that it is beneficial to explicitly include a sufficient statistic &
   functions for its manipulation in the model type's specification. Instances
   of such types can then be used by generalized learning algorithms while
   maintaining optimal learning time complexity. Examples are given for
   problems such as incremental learning & data partitioning problems (e.g.,
   change-point problems, decision trees & mixture models)"
   Hong Kong, 21-23 March 2003, EUR99.0; uk us isbn:354040550X.

%A J. W. Comley
%A L. Allison
%A L. J. Fitzgibbon
%T Flexible decision trees in a general data-mining environment
%J Intelligent Data Engineering and Automated Learning
%I Springer
%V 2690
%P 761-767
%D 2003
%O (4th Int. Conf. on Intelligent Data Engineering and Automated Learning,
   IDEAL-2003, Hong Kong)
%K conf, IDEAL, IDEAL4, IDEAL2003, decision tree, DTree, c2003, c200x, c20xx,
   classification trees, CDMS, LAllison, monash, MML, MDL, AI, II
%X "We describe a new data-mining platform, CDMS, aimed at the streamlined
   development, comparison and application of machine learning tools. We discuss
   its type system, focussing on the treatment of statistical models as
   first-class values. This allows rapid construction of composite models -
   complex models built from simpler ones - such as mixture models,
   Bayesian networks and decision trees. We illustrate this with a flexible
   decision tree tool for CDMS which rather than being limited to discrete
   target attributes, can model any kind of data using arbitrary probability
   -- [Dtrees],
   Hong Kong, 21-23 March 2003, EUR99.0; uk us isbn:354040550X.

%A L. Allison
%T Types and classes of machine learning and data mining
%J 26th Australasian Computer Science Conference (ACSC)
%P 207-215
%D 2003
%O ACS Series Conferences in Research and Practice in
   Information Technology V16;
   Australian Computer Science Communications, V25, #1, 2003.
%K conf, ACSC, ACSC26, ACSC2003, type, class, check, model, models, regression,
   time series, timeSeries, functional programming, FP, MML, MMLF P, MMLFP,
   Haskell, type, class, Haskell98, minimum message length, description,
   semantics, MDL, formal, inductive inference, stats, statistics, AI, II,
   zz0203, MDL, c2003, c200x, c20xx, LAllison
%X [paper],
   uk us isbn:0909925941;  issn:14451336.

%A L. J. Fitzgibbon
%A D. L. Dowe
%A L. Allison
%T Message from Monte Carlo
%R 2002/107
%I School of Computer Science and Software Engineering, Monash University,
   Australia 3800
%D 2002
%K TR 107, TR107, minimum message length, MML, description, MDL, inference, II,
   Markov Chain Monte Carlo, MCMC, MMLD, reversible, jump, approximation,
   parameter estimation, AI, Bayesian, Montecarlo, Las Vegas, Lasvegas,
   model, modelling, c2002, c200x, c20xx, zz1202, LAllison, univariate,
   polynomial regression, importance sampling
%X "... [MML] is used to partition a sample from an importance sampling density
   of the model parameters into coding regions. The regions contain models that
   are similar & hence the method  can be loosely described as model clustering.
   ... The method is made practical by using the MML Boundary Rule & importance
   sampling.  The MML Boundary Rule is a result that applies to MML
   approximations in a common form & which states that the boundary of the
   optimal coding region will be isometric or approximately isometric.  This
   means that we need only consider regions with isometric boundaries.
   Importance sampling is used to efficiently evaluate expectations & integrals
   using Monte Carlo integration, ... The method requires a sample from a
   suitable importance sampling density & a means of approximating the
   Kullback-Leibler distance between any two models.  For general use, we
   suggest that the posterior is suitable for the importance sampling.  In this
   case the Message from Monte Carlo methodology becomes aligned with Bayesian
   posterior sampling & can be considered as a means of introspection into the
   posterior sample.
   We give an example of the method applied to univariate polynomial regression.
   [It] was chosen because it is a useful & intuitive problem for which the
   MML87 method can & already has been applied.  We use the posterior as an
   importance sampling density & the polynomial parameters are sampled using
   Reversible Jump Markov Chain Monte Carlo."
   Also see ICML2002, HICS2003 & [MML];

%A L. Allison
%T Model classes
%I School of Computer Science and Software Engineering, Monash University,
   Australia 3800
%R 2002/125
%D 2002
%K TR 125, TR125, c2002, c200x, c20xx, zz1102, LAllison
%X Also see [ACSC26 '03]
   and [II].

%A L. J. Fitzgibbon
%A D. L. Dowe
%A L. Allison
%T Change-point estimation using new minimum message length approximations
%J Seventh Pacific Rim Int. Conf. on Artificial Intel. (PRICAI-2002)
%W Tokyo
%I SpringerVerlag
%V 2417
%P 244-254
%D 2002
%K conf, PRICAI, PRICAI7, PRICAI02, PRICAI2002, AI, fairly strict, FSMML, MMLA,
   minimum message length, MML, inductive inference, II, description, MDL,
   segmentation, MMLD, LNAI, LAllison, DLDowe, Monash, csse, zz0902, c2002,
   c200x, c20xx
%X Authors at Comp. Sci. and S.W.E., Monash Uni., .au
   [abstract]; uk us isbn:3540440380.

%A L. J. Fitzgibbon
%A D. L. Dowe
%A L. Allison
%T Univariate polynomial inference by Monte Carlo message length approximation
%J Int. Conf. Machine Learning 2002
%W Sydney
%I MorganKaufmann
%P 147-154
%D 2002
%K conf, ICML, ICML2002, ICML02, AI, inductive inference, II, LAllison, DLDowe,
   Minimum Message Length, MML, description, MDL, Las Vegas, MonteCarlo, MMC,
   Markov chain, MCMC, polynomial, fit, fitting, regression, Gibbs sampling,
   FSMML, SRM, univariate polynomial, regression, zz0702, c2002, c200x, c20xx
%X  Message from Monte Carlo (MMC) algorithm ... compared with
   MML87 and structural risk minimization  (SRM).
   Authors: Comp. Sci. and S.W.E., Monash Uni., .au.  Also see TR 2002/107.
   [abstract], issn:10491910.

%A L. Stern
%A L. Allison
%A R. L. Coppel
%A T. I. Dix
%T Discovering patterns in Plasmodium falciparum genomic DNA
%J Molecular and Biochemical Parasitology
%V 118
%N 2
%P 175-186
%D 2001
%K MolBio, LAllison, jrnl, c2001, c200x, c20xx, Plasmodium falciparum,
   minimum message length, description, Markov, MDL, MBP, MML, Malaria, entropy,
   sequence analysis, HMM, PFSA, PFSM, information content, genome, DNA,
   pattern, repeat, repeats, DNA models, AED, approximate repeats model, repeat,
   pattern discovery, knowledge, hidden, GHMM, Bayesian, LStern, TIDix,
%X "A method has been developed for discovering patterns in DNA sequences.
   Loosely based on the well-known Lempel Ziv model for text compression, the
   model detects repeated seqs. in DNA. The repeats can be forward or inverted,
   and they need not be exact. The method is particularly useful for detecting
   distantly related seqs., and for finding patterns in seqs. of biased
   nucleotide composition, where spurious patterns are often observed because
   the bias leads to coincidental nucleotide matches. We show here the utility
   of the method by applying it to genomic seqs. of Plasmodium falciparum. A
   single scan of chromosomes 2 and 3 of P.falciparum, using our method and no
   other a priori information about the seqs., reveals regions of low complexity
   in both telomeric and central regions, long repeats in the subtelomeric
   regions, and shorter repeat areas in dense coding regions. Application of the
   method to a recently sequenced contig of chromosome 10 that has a
   particularly biased base composition detects a long internal repeat more
   readily than does the conventional dot matrix plot. Space requirements are
   linear, so the method can be used on large sequences. The observed repeat
   patterns may be related to large-scale chromosomal organization and control
   of gene expression. The method has general application in detecting patterns
   of potential interest in newly sequenced genomic material."
   -- [paper],
   More on [compression] and

%A L. J. Fitzgibbon
%A L. Allison
%A D. L. Dowe
%T Minimum message length grouping of ordered data
%J Proc. 11th Int. Workshop on Algorithmic Learning Theory (ALT'2000)
%E H. Arimura
%E S. Jain
%E A. Sharma
%P 56-70
%W Sydney
%V 1968
%D 2000
%K conf, minimum message length, MML, segment, minimum description length, MDL,
   ALT9, ALT2000, cut point, points, segmenting, time series, sequence analysis,
   complexity, measurement accuracy, model, LAllison, DLDowe, BIC, Monash, II,
   segmentation, problem, data, level, levels, c2000, c200x, c20xx
%X Given a series of multivariate data, approximate it by a piece-wise
   constant function. How many cut-points are there? What are the means
   and variances of each segment? Where should the cut points be placed?
   The simplest model is a single segment.  The most complex model has
   one segment per data point.  The best model is generally somewhere
   between these extremes.  Only by considering model complexity can
   a reasonable inference be made.
   [reprint.ps], uk us isbn:3540412379,

%A D. R. Powell
%A L. Allison
%A T. I. Dix
%T Fast, optimal alignment of three sequences using linear gap costs
%J J. Theor. Biol.
%V 207
%N 3
%P 325-336
%D 2000
%K jrnl, JTB, MolBio, Biology, multiple, sequence, alignments, similarity,
   affine, linear, cost, gaps, insert, delete, indel, indels, DNA, time, speed,
   fast, string, strings, iterative, phylogenetic, family tree, Ukkonen,
   edit distance, dynamic programming algorithm, DPA, DRPowell, LAllison, TIDix,
   c2000, c200x, c20xx, zz1100, J Theoretical Biology, bioinformatics
%X [...] The obvious dynamic programming algorithm for optimally
   aligning k sequences of length n runs in O(n^k) time. This is
   impractical if k >= 3 and n is of any reasonable length.
   [...] new algorithm [is] guaranteed to find the optimal alignment [...]
   particularly fast when the (three-way) edit distance is small. [...]
   O(n + d^3) on average.
   [paper][11/'00] and code,
   more on [bioinformatics].

%A L. Allison
%A D. Powell
%A T. I. Dix
%T Modelling is more versatile than shuffling
%R 2000/83
%I School of Computer Science and Software Engineering, Monash University,
   Australia 3168
%D 2000
%K MolBio, pair, two, sequence, alignment, PFSA, hidden Markov model, PHMM,
   low medium information content, repeat, repetition, structure, pattern,
   model, shuffle, shuffling, randomize, DNA, permute, tuples, frequencies,
   family, PFSM, HMM, dynamic programming algorithm, DPA, homology, algorithm,
   minimum message length, MML, description, MDL, LAllison, DRPowell, TIDix,
   mAlignment, TR 83, TR83, c2000, c200x, c20xx, bioinformatics
%X It is shown how to incorporate almost any (left to right) model of a
   population of sequences into the alignment DPA.  Doing so is an alternative
   to shuffling/ randomizing (Fitsch; Altschul, Erickson ...) the sequences
   to correct for population biases.  The resulting algorithm gives
   fewer false positives, fewer false negatives, and can (and should)
   change the rank ordering of alignments.
   [also search for: modelling alignment]
   [mon], and

%A L. Allison
%A L. Stern
%A T. Edgoose
%A T. I. Dix
%T Sequence complexity for biological sequence analysis
%J Computers and Chemistry
%O (jrnl later became Computational Biology and Chemistry, c2006)
%V 24
%N 1
%P 43-55
%D 2000
%K jrnl, comp chem, MolBio, c2000, c200x, c20xx, data compression, compress,
   MML, MDL, DNA, mine, approximate repeats, repeat, AED, model, repetitive,
   complexity, high, moderate, low, information content, bioinformatics,
   LAllison, TIDix, computational biology and chemistry, LStern, description,
   minimum message length, bioinformatics, AI, inductive inference,
   data mining, time series, timeSeries
%X "A new statistical model for DNA considers a sequence to be a mixture of
   regions with little structure and regions that are approximate repeats of
   other subsequences, i.e. instances of repeats do not need to match each other
   exactly. Both forward- and reverse-complementary repeats are allowed. The
   model has a small number of parameters which are fitted to the data. In
   general there are many explanations for a given sequence and how to compute
   the total probability of the data given the model is shown. Computer
   algorithms are described for these tasks. The model can be used to compute
   the information content of a sequence, either in total or base by base. This
   amounts to looking at sequences from a data-compression point of view and it
   is argued that this is a good way to tackle intelligent sequence analysis
   in general."
   -- [more],
   Also see [Bioinformatics].

%A L. Allison
%T Generator and search objects in Java
%J J. Res. and Practice in Inf. Tech.
%V 32
%P 3-12
%N 1
%D 2000
%K jrnl, ACS, JRPIT, continuation, c2000, c200x, c20xx, LAllison, object, OOP,
%X [.pdf],
   more on [Java];
   also see TR97/317.

%A T. Edgoose
%A L. Allison
%T MML Markov classification of sequential data
%J Stats. and Comp.
%I Kluwer Academic
%V 9
%N 4
%P 269-278
%D 1999
%K jrnl, stats, computing, minimum message length, MML, description, MDL,
   statistical, inductive inference, clustering, classification, II, AI,
   sequence, data analysis, model, algorithm, mining, series, forwards,
   backwards, LAllison, c1999, c199x, c19xx, Snob, time series, timeSeries
%X [Markov-Models],

%A D. R. Powell
%A L. Allison
%A T. I. Dix
%T A versatile divide and conquer technique for optimal string alignment
%V 70
%N 3
%P 127-139
%D 1999
%K IPL, jrnl, c1999, c199x, c19xx, zz0899, dynamic programming algorithm, DPA,
   MolBio, DRPowell, LAllison, TIDix, bioinformatics, space, complexity,
   strings, Edit Distance, similarity, LCS, Fast, Hirschberg, Ukkonen, Myers,
   time, speed, Linear Space, Check Point, pointing, checkpoint, algorithms
%X  A check-pointing (CP) technique uses O(n) space but is simpler
   than Hirschberg's O(n)-space technique;  H' ('97) attributes
   an O(N**2)-time simple edit-distance CP to Eppstein.  Here, CP
   is applied to more complex cost functions, e.g., linear gap costs, and ...
   to Ukkonen's O(n*d)-time DPA, even including linear gap costs,
   to give  O(n)-space,  O(n.log d + d**2)-average-time,
   effectively O(d**2)-time in many practical situations.
   -- [doi:10.1016/S0020-0190(99)00053-8][6/'04],
   &  [Divide-and-C.].

%A L. Allison
%A D. Powell
%A T. I. Dix
%T Compression and approximate matching
%V 42
%N 1
%P 1-10
%D 1999
%K jrnl, MolBio, COMPJ, Computer Journal, LAllison, DRPowell, TIDix, pair,
   string, strings, sequence, alignment, analysis, algorithm, homology,
   similarity, limits, HMM, MML, MDL, II, normalized, limit, significance test,
   testing, jie, med, DPA, low information content, repeats, repetitive, wei,
   non-random, nonrandom, compressible, bioinformatics, pair, probabilistic,
   information theory, features, complexity, time, fast, speed, shuffling,
   shuffle, randomize, DNA, edit distance, hidden Markov model, HMM, PHMM,
   c19xx, c1999, c199x, modelling, Malignment, context dependent, scoring
%X  A population of sequences is called non-random if there is a statistical
   model and an associated compression algorithm that allows members of the
   population to be compressed, on average.  Any available statistical model
   of a population should be incorporated into algorithms for alignment of
   the sequences and doing so changes the rank-order of possible alignments
   in general.  The model should also be used in deciding if a resulting
   approximate match between two sequences is significant or not.  It is
   shown how to do this for two plausible interpretations involving pairs
   of sequences that might or might not be related.  Efficient alignment
   algorithms are described for quite general statistical models of sequences.
   The new alignment algorithms are more sensitive to what might be termed
   `features' of the sequences.  A natural significance test is shown to be
   rarely fooled by apparent similarities between two sequences that are merely
   typical of all or most members of the population, even unrelated members.
   -- [more],
   Also see [Powell AI2004],
        and [bioinformatics].

%A L. Allison
%A T. Edgoose
%A T. I. Dix
%T Compression of strings with approximate repeats
%J Intell. Sys. in Mol. Biol. '98
%P 8-16
%D 1998
%K conf, MolBio, c1998, c199x, c19xx, LAllison, Monash, ISMB6, ISMB98, ISMB 98,
   DPA, inductive inference, II,  DNA, Chen, Behzadi, F. Le Fessant, protein,
   sequence analysis, data, text, repeat, compression, bioinformatics, Li,
   repetitive, kwong, hidden Markov models, HMM, low information content,
   lines, ALU, sines, ming, self similarity, AED, model, minimum message length,
   MML, description, MDL, compress, algorithm
%X "We describe a model for strings of characters that is loosely based on the
   Lempel Ziv model with the addition that a repeated substring can be an
   approximate match to the original substring; this is close to the situation
   of DNA, for example. ... The model has a small number of parameters and these
   can be estimated from the given string by an expectation-maximization (EM)
   algorithm. Each iteration of the EM algorithm takes O(n^2) time and a few
   iterations are typically sufficient. O(n^2) complexity is impractical for
   strings of more than a few tens of thousands of characters and a faster
   approximation algorithm is also given. The model is further extended to
   include approximate reverse complementary repeats when analyzing DNA
   strings. Tests include the recovery of parameter estimates from known
   sources and applications to real DNA strings."
   -- [more]  ISMB98, 28 June - 1 July 1998,
   [.pdf]   Montreal, Canada; uk us isbn:1577350537.
   Also see [bioinformatics].

%A L. Allison
%T Information-theoretic sequence alignment
%I School of Computer Science and Software Engineering, Monash University
%R 98/14
%D 1998
%K TR14, TR 14, MolBio, LAllison, string, strings, similarity, edit-distance,
   homology, approximate match, matching, DNA, DPA, hidden Markov model, HMM,
   low information content, repeats, repetitive, compressible, MML, MDL,
   data compression, content, sequences, c1998, c199x, c19xx, bioinformatics
%X [TR98/14],
   Also see:  Allison, Powell and Dix, `Compression and Approximate Matching',
     Comp. J. 42(1) pp1-10, 1999,  for a fuller explanation and later results.
   Also see [Bioinformatics].

%A G. Pringle
%A L. Allison
%A D. L. Dowe
%T What is a tall poppy among web pages?
%J Proc. 7th Int. World Wide Web Conference (in Comp. Networks and ISDN Sys.)
%V 30
%N 1-7
%P 369-377
%D 1998
%O Comp. Networks and ISDN Systems (Jrnl)
%I Elsevier
   decision tree, DT, search engine, engines, rating, ranking, Lycos, Infoseek,
   Altavista, Alta vista, page, pages, LAllison, DLDowe, GPringle, Monash,
   c1998, c199x, c19xx, zz0598
%X "Search engines & indices were created to help people find information
   amongst the rapidly increasing # of World Wide Web (WWW) pages. ... treat
   modelling the workings of WWW search engines as an inductive inference
   problem. A training set of data is collected, being pages returned in
   response to typical queries. Decision trees are used as the model class for
   the search engines' selection criteria although ..."
   -- [more],
   Uses inductive inference technology to infer why some web
   pages rank highly on certain Internet Search Engines.
   Conf. held 14-18 April 1998, Brisbane, Australia. uk us isbn:0169755298.

%A D. L. Dowe
%A L. Allison
%A G. Pringle
%T The hunter and the hunted - modelling the relationship between web pages
   and search engines
%J 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD98)
%P 380-382
%D 1998
%I SpringerVerlag
%V 1394
%K conf, PAKDD2, PAKDD 98, internet, web page, decision tree, engine,
   Dtree, LAllison, GPringle, DLDowe, Monash, c1998, c199x, c19xx
%X uk us isbn:3540643834. Also see  Pringle et al  in  WWW7 pp.369-377 c1998.

%A T. Edgoose
%A L. Allison
%T Minimum message length hidden Markov modelling
%J Data Compression Conf.
%W Snowbird, Utah
%P 169-178
%D 1998
%K conf, DCC, DCC98, model, MML, MDL, HMM, inductive inference, II, Rissanen,
   sequence, model, series, clustering, classification, data analysis, CSci,
   description, complexity, time series, timeSeries, CompSci, LAllison, Monash,
   Snob, c1998, c199x, c19xx, Snowbird, AI
%X "This paper describes a minimum message length (MML) approach to finding the
   most appropriate hidden Markov model (HMM) to describe a given sequence of
   observations. An MML estimate for the expected length of a two-part message
   stating a specific HMM and the observations given this model is presented
   along with an effective search strategy for finding the best number of
   states for the model. The information estimate enables two models with
   different numbers of states to be fairly compared which is necessary if the
   search of this complex model space is to avoid the worst locally optimal
   solutions. The general purpose MML classifier 'Snob' has been extended and
   the new program 'tSnob' is tested on 'synthetic' data and a large 'real
   world' dataset. The MML measure is found to be an improvement on the Bayesian
   information criteria (BIG) and the un-supervised search strategy."
   -- [Markov-M.],

%A T. Edgoose
%A L. Allison
%T Unsupervised Markov classification of sequence data using MML
%J Proc. 21st Australian Comp. Sci. Conf.
%E C. McDonald
%W Perth
%I SpringerVerlag
%P 81-94
%D 1998
%K ACSC, ACSC21, ACSC98, inductive inference, II, time series, model, HMM,
   serial correlation, SNOB, AI, machine learning, hidden, data mining, MDL,
   LAllison, Monash, class, cluster, clustering, time series, timeSeries,
   c1998, c199x, c19xx, zz0198
%X also see [MML],
   uk us isbn:9813083905; order from...[springer].

%A D. R. Powell
%A L. Allison
%A T. I. Dix
%A D. L. Dowe
%T Alignment of low information sequences
%J Australian Computer Science Theory Symposium, CATS '98
%W Perth
%P 215-230
%D 1998
%K conf, MolBio, align, HMM, DPA, DNA, ACSC, CATS, CATS98, LAllison, DLDowe,
   bioinformatics, probability, low information, simple, features,
   TIDix, Monash, c1998, c199x, c19xx, DRPowell
%X "Alignment of two random sequences over a fixed alphabet can be shown to be
   optimally done by a Dynamic Programming Algorithm (DPA). It is normally
   assumed that the sequences are random and incompressible and that one
   sequence is a mutation of the other. However, DNA and many other sequences
   are not always random and unstructured, and the issue arises as how to best
   align compressible sequences.  Assuming our sequences to be non-random and
   to emanate from mutations of a first order Markov model, we note that
   alignment of high information regions is more important than alignment of
   low information regions and arrive at a new alignment method for low
   information sequences which performs better than the standard DPA for data
   generated from mutations of a first order Markov model."
   -- [more], uk us isbn:9813083921,
   (Also see [Bioinformatics].)

%A T. Edgoose
%A L. Allison
%A D. L. Dowe
%T An MML classification of protein structure that knows about angles and
%J Pacific Symposium on Biocomputing '98
%P 585-596
%D 1998
%K conf, MolBio, PSB, PSB3, PSB98, von Mises, vonMises, angle, dihedral, class,
   cluster, clustering, HMM, SNOB, time series, timeSeries, ARC A49602504,
   LAllison, MDL, directional, distribution, bioinformatics, Monash,
   c1998, c199x, c19xx
%I World Scientic
%X SNOB + vonMises circular probability distribution + 1st order Markov model.
   phi-psi pairs give 17 classes and a class seq' correlation matrix.
   [paper.pdf@stanford.edu]['98]; uk us isbn:9810232780.
   von Mises, probability density:
      f(x | mu, kappa) = (1/(2.pi.I0(kappa))).exp(kappa.cos(x-mu))
                 where I0(kappa) is a normalisation constant.

%A D. R. Powell
%A D. L. Dowe
%A L. Allison
%A T. I. Dix
%T Discovering simple DNA sequences by compression
%J Pacific Symposium on Biocomputing '98
%P 597-608
%D 1998
%K conf, MolBio, PSB, PSB3, PSB98, repeat, repetition, data, model, random,
   ARC A49602504, bioinformatics, DRPowell, LAllison, DLDowe, TIDix, Monash,
   zz0198, simple, low information, Milosavljevic, Jurka, c1998, c199x, c19xx,
%I World Scientic
%X "An information-theoretic DNA compression scheme devised by Milosavljevic and
   Jurka(1993) has been used in many places in the literature for both the
   discovery of new genes and the compression of DNA. Their compression method
   applies an encoding of previously occurring runs. They use 5 different
   code-words: four being the DNA bases, A,C,G and T, and the other being a
   pointer to a previously occurring run. They advocate a code-word of length
   log2(5) for each of these and then encoding a run by a code-word of length
   2log2(n), where n is the length of the sequence. This scheme encodes the
   start of the sequence with a code-word of length log2(n) and likewise encodes
   the end of the sequence with a code-word of length log2(n). In this paper,
   we show the above coding scheme to be inefficient in various ways and improve
   upon it so that it can compress DNA. We discuss our implementation of various
   schemes some of which run in linear time."
   -- [more],
   uk us isbn:9810232780.
   Also see [MML page]
   and Milosavljevic '95.

%A K. B. Korb
%A C. Kopp
%A L. Allison
%T Higher education policy in Australia
%J Australian Rationalist
%N 45
%P 16-26
%D 1997
%O `A Statement on Higher Education Policy in Australia',
   TR 97/318, Department of Computer Science, Monash University, 1997
%K jrnl, TR318, TR 318, HER, West Review, University, Universities, funding,
   Government policy, trends, technology, quality, Carlo, LAllison, zz1297,
   c1997, c199x, c19xx
%X [.ps.gz],

%A L. Allison
%T Java continues Prolog-like
%I Department of Computer Science, Monash University
%R 97/317
%D 1997
%K TR317, TR 317, LAllison, c1997, c199x, c19xx
%X see: Allison, JRPIT, c2000.

%A T. C. Edgoose
%A L. Allison
%T Mixture modelling of sequential datasets
%R 96/285
%I Department of Computer Science, Monash University
%D 1996
%K TR285, TR 285, sequence, series, HMM, hidden Markov model, classification,
   clustering, Snob, minimum message length, MML, description, MDL, c1996,
   inductive inference, II, data sets, analysis, LAllison, Monash, c199x, c19xx
%X Data is a sequence of items.  Each item is a number of measurements.
   The program clusters items into classes.  Each class is described
   by distributions (currently finite state, or Normal, or von-Mises)
   on attributes.  The program determines the number and form of each class.
   The class of an item can be influenced by those of its neighbours.
   Partial assignment allows for uncertainty of class membership.
   Tested on artificial and real data including some weather data
   and dihedral angles from protein structures.
   [Also search for:  Snowbird DCC98],
   and see [MML].

%A D. L. Dowe
%A L. Allison
%A T. I. Dix
%A L. Hunter
%A C. S. Wallace
%A T. Edgoose
%T Circular clustering of protein dihedral angles by minimum message length
%J Pacific Symposium on Biocomputing '96
%P 242-255
%D 1996
%I World Scientific
%O TR 95/237, Dept. Computer Science, Monash University, Oct 1995
%K PSB, PSB96, TR 237, TR237, Monash, DLD, CSW, CSWallace, LAllison, MolBio,
   Monash, classification, angle, von Mises, vonMises, protein structure,
   inductive inference, II, MML, MDL, conf, bioinformatics, c1996, c199x, c19xx
%X L. Hunter - NLM, NIH. PSB '96: 3-6 Jan 1996, Hawaii; uk us isbn:9810225784.

%A R. A. Baxter
%A L. Allison
%A K. Korb
%T Regulation of on-line information services (a submission)
%D 1995
%K censor, censorship, regulation, control, internet, inter net, Monash,
   world wide web, LAllison, www, HREF, http, c1995, c199x, c19xx
%X Submission to the Dept. of Communications and the Arts  re  their
   consultation paper on the regulation of online information services
   of 7 July 1995.
   Also to the Senate Enquiry into Regulation of Computer On-Line Services
   (closing date for submissions 29 Sept 1995)
   See also TR224.

%A L. Allison
%T Towards modelling evolution = mutation modulo selection in sequence
%R 95/225
%I Dept. Computer Science, Monash University
%D 1995
%K LAllison, Monash, TR225, TR 225, MolBio, evolution, pressure, selection,
   fit, fitness, family, phylogenetic, evolutionary tree, trees, sequence,
   multiple alignment, zz0795, c1995, c199x, c19xx, bioinformatics
%X [bioinformatics].

%A L. Allison
%A R. A. Baxter
%T Protecting Our Innocents
%R 95/224
%I Dept. Computer Science, Monash University
%D 1995
%K LAllison, Monash, TR224, TR 224, internet, inter net,  world wide web,
   child, children, kids, K12, porn, pornography, violence, politics, sex,
   online, censorship, censor, ethics, ethical, CICA, PICS, surfwatch,
   www, HREF, http, zz0695, topical, c1995, c199x, c19xx
%X Argues that internet authors (www, ftp, ...) should be encouraged,
   NOT required, to describe their works using annotations that are
   objective and machine readable.  (This is not a censorship proposal and
   it cannot be (mis)used as one.)
   [TR 95/224 here]
   Also see: Regulation of on-line information services (a submission) Aug 1995.

%A L. Allison
%T Toward a Denotational Semantics for
   <A HREF="http://www.cs.monash.edu.au/~lloyd/tildeFP/LambdaLog">LFP</A>
   = Logic + Functional Programming, Coded in Lazy ML.
%R 94/204
%D 1994
%I Dept. Computer Science, Monash University
%K LAllison, logic, functional, programming, LP, FP, LFP, FLP, LambdaLog,
   LML, Monash, TR 204, TR204, c1994, c199x, c19xx
%X [LFP],

%A L. Allison
%T Using Hirschberg's algorithm to generate random alignments of strings
%V 51
%N 5
%P 251-255
%D 1994
%K c1994, c199x, c19xx, LAllison, Monash, jrnl, IPL, MolBio, bioinformatics, GS,
   DNA, methods, MML, minimum message length encoding, II, inductive inference,
   Hirschberg, string, sequence, alignment, similarity, homology, approximate,
   match, matching, LCS, LCSS, edit distance, Bayesian, Gibbs sampling, MCMC,
   random, sample, DPA, simulated annealing, SA, dynamic programming algorithm,
   divide and conquer, hidden Markov model, HMM, probability,
   posterior distribution, stochastic
%X  Hirschberg's (CACM '75) recursive divide and conquer technique for
   the dynamic programming technique (LCS, LCSS, Edit Distance) is
   applied to the problem of sampling alignments of two strings
   at RANDOM from the alignments' posterior probability distribution.
   Also see [Bioinformatics].

%A L. Allison
%A C. S. Wallace
%T An information measure for the string to string correction problem with
%J 17th Australian Comp. Sci. Conf.
%P 659-668
%D 1994
%W Christchurch, N. Z.
%K LAllison, CSW, CSWallace, Monash, conf, MolBio, inductive inference, II,
   string, sequence, family, evolutionary, phylogenetic, tree, trees,
   variation, variance, uncertainty, estimate, estimation, parameters, DNA,
   multiple alignment, Gibbs sampling, sample, GS, simulated annealing SA,
   minimum message length MML, Bayesian, temperature, cooling, probabilistic,
   NZ, New Zealand, c1994, c199x, c19xx, ACSC 17, 94, ACSC17, ACSC94,
   bioinformatics, Monash
%O Australian Comp. Sci. Comm., Vol 16,  No 1(C), 1994, isbn:047302313X.
%X It has been shown how to calculate a probability for an alignment.
   Alignments are sampled from their posterior probability distribution.
   This is extended to multiple alignments (of several strings).  Averaging
   over many such alignments gives good estimates of how closely the strings
   are related and in what way.  In addition, sampling in an increasingly
   selective way gives a simulated annealing search for an optimal alignment.
   See also the related paper J. Mol. Evol. (39, pp418-430, 1994),
   "The posterior probability distribution ...", for more results.

%A B. M. Jenkins
%A A. J. Davison
%A L. Allison
%A A. J. Maeder
%T A perimeter based technique for fusion of multi-sourced micrographic images
%J Dicta-93
%W Sydney Australia
%M DEC 8-10
%D 1993
%P 556-563
%K c1993, c199x, c19xx, LAllison, Monash, conf, image processing IP, shape,
   curve, signature, DICTA DICTA93, Monash
%X (Also see [MML].)

%A L. Allison
%A C. S. Wallace
%T The posterior probability distribution of alignments and its application
   to parameter estimation of evolutionary trees and to optimization of
   multiple alignments
%J J. Mol. Evol.
%V 39
%N 4
%P 418-430
%D 1994
%O An earlier version is TR 93/188, Dept. Comp. Sci., Monash U., July '93
%K jrnl, MolBio, JME, c1994, c199x, c19xx, LAllison, CSWallace, CSW, DNA,
   bioinformatics, optimisation, estimate, infer, parameters, algorithm,
   multiple, alignment, data, string, molecular, sequence, homology, Markov,
   family, phylogenetic, tree, trees, edit distance, Monte Carlo method, mcmc,
   simulated annealing, SA, inductive inference, II, sample, speed, Bayesian,
   dynamic programming algorithm, DPA, stochastic, methods, GS, Gibbs sampling,
   minimum message length encoding, MML, chain, minimum description length, MDL,
   transthyretin, chloramphenicol resistance gene, CAT, CATB, CATD, CATP, CATQ,
   CCOLI, ECOLI, algorithmic, mutual information, theory, significance,
   probabilistic, temperature, limits, TR 93/188, TR188
%X  "It is shown how to sample alignments from their posterior probability
   distribution given two strings.  This is extended to sampling alignments of
   more than two strings.  The result is firstly applied to the estimation of
   the edges of a given evolutionary tree over several strings.  Secondly,
   when used in conjunction with simulated annealing, it gives a stochastic
   search method for an optimal multiple alignment."
   -- [paper] and source code,
   (The JME paper is a much expanded and changed version of TR 93/188,

%A A. Finlay
%A L. Allison
%T A correction to the denotational semantics for Prolog of Nicholson and Foo
%V 15
%N 1
%P 206-208
%D 1993
%K TOPLAS, jrnl, note, LAllison, Alan, alanf, Monash, DS, semantic, Prolog,
   logic programming, LP, c1993, c199x, c19xx, Monash
%X [doi:10.1145/151646.151652][2/'04]
   Also see [Logic Programming],
        and [Denotational Semantics].

%A L. Allison
%T Applications of recursively defined data structures
%J Aust. Comp. J.
%V 25
%N 1
%P 14-20
%D 1993
%O 26/6/2022: https://arxiv.org/abs/2206.12795
%K jrnl, ACJ, LAllison, Monash, functional applicative programming, FP,
   lazy, need, circular, program, recursive, recursion, data structure,
   list, lists, search trees, tree, breadth first traversal, BFT, queue, memo,
   Fib, Fibonacci, n-queens, nqueens, n queens, irreducible, good,
   sequences, Axel Thue, c1993, c199x, c19xx, TR 88 119 TR88-119 TR119
%O an early version in TR 88/119 Dept. Computer Science, Monash University '88
%X "A circular program contains a data structure whose definition is
   self-referential or recursive.  The use of such a definition allows
   efficient functional programs to be written and can avoid the creation of
   intermediate data structures that would have to be garbage collected.
   This paper uses circular programs in various ways, to implement
   memo-structures and explicit search-trees to hold solutions to
   constraint-satisfaction problems."
   Examples: search trees, n-queens, irreducible sequences.
   -- [more_1],
   [Also search for: FP circular program].

%A L. Allison
%T A fast algorithm for the optimal alignment of three strings
%J J. Theor. Biol.
%V 164
%N 2
%P 261-269
%D 1993
%O TR 92/168  Dept. Computer Science, Monash University, Oct '92.
%K LAllison, Monash, jrnl, II, JTB, MolBio, bioinformatics, multiple alignment,
   edit distance, Ukkonen, three, string, strings, sequence, sequences,
   dynamic programming algorithm, DPA, TR 92 168 TR92-168 TR168,
   c1993, c199x, c19xx, J Theoretical Biology
%X Given 3 strings, length ~ n, 3-way edit-distance d,
   O(n.d^2) time algorithm worst case, O(n+d^3) typically.
   Tree costs 0/1/2.   ie.   xxx :0;    xxy, xx-, x-- :1;    xyz, xy- :2
   NB. Each internal node of an unrooted binary tree has 3 neighbours.
   [paper] inc' pdf paper and code,
   and more on [bioinformatics].

%A L. Allison
%T Normalisation of affine gap costs used in optimal sequence alignment
%J J. Theor. Biol.
%D 1993
%V 161
%N 2
%P 263-269
%K LAllison, Monash, jrnl, MolBio, JTB, alignment, string, sequence analysis,
   edit distance, homology, gap, gaps, linear, indel, insert delete, Altschul,
   mutual information, similarity, hidden Markov model, HMM, bioinformatics,
   inductive inference, II, c1993, c199x, c19xx, DNA, J Theoretical Biology
%X "It is shown how to normalize the costs of an alignment algorithm that
   employs affine or linear gap costs. The normalized costs are interpreted as
   the -log probabilities of the instructions of a finite-state edit-machine.
   This gives an explicit model relating sequences that can be linked to
   processes of mutation and evolution."
   -- [more],
      [paper] inc' pdf.

%A L. Allison
%T Lazy dynamic programming can be eager
%V 43
%N 4
%P 207-212
%D 1992
%K LAllison, Monash, jrnl, FP, lazy functional programming, Haskell, fast,
   efficient, dynamic programming algorithm, DPA, edit distance, LCS, LCSS,
   MolBio, approximate, similar, string, strings, match, matching, sequence,
   alignment, c1992, c199x, c19xx, bioinformatics
%X  Lazy evaluation in a functional language is exploited to make the simple
   dynamic programming algorithm for the edit-distance problem run quickly
   on similar strings:  being lazy can be fast.
   Runs in O(n*d) time thanks to laziness.
   -- [more],

%A C. N. Yee
%A L. Allison
%T Fast string alignment with linear indel costs
%R TR 92/165
%I Computer Science, Monash University
%D 1992
%K LAllison, MolBio, string alignment, similarity, homology, Ukkonen,
   edit distance, linear indel gap cost, costs, Ukkonen, Monash,
   TR 165, TR92/165, TR165, II, c1992, c199x, c19xx, bioinformatics
%X two strings,  O(n*d) time.
   The constants in the cost function have to be "small" integers.
   [Computing for Molecular Biology].
   [Also search for: Ukkonen].

%A D. L. Dowe
%A J. Oliver
%A L. Allison
%A T. I. Dix
%A C. S. Wallace
%T Learning rules for protein secondary structure prediction
%J Proc. 1992 Department Research Conf.
%I Dept. Computer Science, University of Western Australia
%E C. McDonald
%E J. Rohl
%E R. Owens
%D 1992
%O TR 92/163, Dept. Computer Science, Monash University, JUN '92
%K LAllison, CSW, DLD, Monash, UWA, WA, conf, MolBio, decision tree, trees,
   graph, protein, amino acid, AA, secondary structure, SS, prediction,
   rule, rules, alpha helix, beta strand, extended sheet, coil, turn,
   CSWallace, inductive inference, II, MML, minimum message length, c1992,
   c199x, c19xx, bioinformatics, TR 92 163, TR92-163, TR163
%X [TR92/163.ps]
   Also see [Bioinformatics],
   and TR 92/163.
   [CSci UWA home]['00]; uk us isbn:0864221959.

%A D. L. Dowe
%A J. Oliver
%A T. I. Dix
%A L. Allison
%A C. S. Wallace
%T A decision graph explanation of protein secondary structure prediction
%J 26th Hawaii Int. Conf. Sys. Sci.
%V 1
%P 669-678
%D 1993
%K LAllison, CSW, Monash, conf, MolBio, protein secondary structure prediction,
   conformation, alpha helix, ss, AA, beta sheet extended strand, turn, coil,
   II, inductive inference, decision graph tree, DTree, CSWallace, CSW,
   MML, Minimum message length encoding, description, MDL, Bayesian,
   TR163 163, c1993, c199x, c19xx, bioinformatics, HICSS, HICSS26, HICSS93
%X Oliver and Wallace (IJCAI '91) introduced `decision graphs' -
   a generalisation of decision trees - here applied to protein secondary
   structure prediction.
   [paper (HTML)].
   Also see TR 92/163.

%A C. N. Yee
%A L. Allison
%T Reconstruction of strings past
%J Bioinformatics
%O (was Comp. Appl. BioSci.)
%V 9
%N 1
%P 1-7
%D 1993
%O TR 92/162, Dept. Computer Science, Monash University, May '92
%K LAllison, Monash, jrnl, MML, minimum message length, mdl, J. Bioinformatics,
   encoding, hidden Markov model, HMM, II, inductive inference, probabilistic,
   MolBio, string, sequence, alignment, homology, similarity, edit,
   evolutionary, distance, parameter estimation, r-theory,
   TR TR92-162 TR162 162, c1993, c199x, c19xx, CABIOS, J. Bioinformatics
%X Use of single optimal alignment gives biased estimates of the evolutionary
   "distance" between two strings but the r-theory, average all alignments,
   recovers accurate estimates over a very wide range of similarity.
   -- [more],
   Also see [Bioinformatics].
   [now J. Bioinformatics].

%A L. Allison
%T Some algorithmic attacks on multiple alignment (abstract)
%J Boden Conf.
%W Thredbo, Australia
%D 1993
%K LAllison, Monash, RSBS, ANU, conf, MolBio, MML, II, string,
   sequence, approximate match, matching, three, DPA, bioinformatics
%X also see [Bioinformatics].

%A L. Allison
%T Estimating parameters and evolutionary distances in the inference of
   evolutionary trees (abstract)
%J Robertson Symposium.
%W Australian National University
%D 1993
%K LAllison, Monash, RSBS, ANU, conf, MolBio, MML, inductive inference,
   II, string, sequence, multiple alignment, approximate match,
   matching, phylogenetic, tree, trees, c1993, c199x, c19xx, bioinformatics
%X also see [Bioinformatics].

%A L. Allison
%A C. S. Wallace
%A C. N. Yee
%T Minimum message length encoding, evolutionary trees and multiple alignment
%J 25th Hawaii Int. Conf. on Sys. Sci.
%K LAllison, CSW, Monash, conf, MolBio, minimum message length encoding, MML,
   ML, evolutionary, family, phylogenetic, tree, trees, CSWallace, CSW, human,
   Bayesian, finite state, model, machine, FSM, hidden Markov model, primate,
   HMM, DNA, multiple alignment, inductive inference, II, bioinformatics, chimp,
   HICSS, HICSS25, HICCS92, TR 91 155, TR91-155, TR155, c1992, c199x, c19xx
%V 1
%P 663-674
%D 1992
%O TR 91/155, Dept. Computer Science, Monash University '91
%X "A method of Bayesian inference known as MML encoding is applied to inference
   of an evolutionary tree and to multiple alignment for K >= 2 strings.
   It allows the posterior odds-ratio of two competing hypotheses, for
   example two trees, to be calculated. A tree that is a good hypothesis forms
   the basis  of a short message describing the strings.  The mutation
   process is modelled by finite-state machine.  It is seen that tree
   inference and multiple alignment are intimately connected."
   -- [paper],
   there is an example on the primate globin pseudo-genes.
   (Also see [Bioinformatics].)

%A L. Allison
%A T. I. Dix
%A C. N. Yee
%T Shortest path and closure algorithms for banded matrices
%V 40
%P 317-322
%D 1991
%O TR 89/133, Computer Science, Monash University, Nov 1989
%K LAllison, Monash, jrnl, all pairs, shortest path, paths, closure, algorithm,
   band, banded, matrix, matrices, graph, -ve negative cycle problem,
   constraint, TR89/133, TR 89, 133, TR133, c1991, c199x, c19xx
%X  All pairs shortest paths problem in banded (width b) matrices.
   O(n b^2) for entries in band and for negative cycle problem,
   O(n^2 b) for all pairs shortest paths  where b is band width.

%A S. T. S. Ho
%A L. Allison
%A C. N. Yee
%A T. I. Dix
%T Constraint checking for restriction site mapping
%J 24th Hawaii Int. Conf. on Sys. Sci.
%V ?
%P ???-???
%D 1991
%O TR 89/129 Computer Science, Monash University, JUL '89
%K LAllison, Monash, conf, MolBio, HICSS, HICSS24, HICSS91, bioinformatics,
   restriction site mapping, RSM, map, constraint satisfaction problem CSP,
   separation theory, TR 89/129 89 129 TR89/129 TR129, c1991, c199x, c19xx
%X also see [mapping].

%A L. Allison
%A M. R. Garrett
%A A. J. Maeder
%T Particle shape characterization and comparison using curve signatures
%J 21st Conf. of the Fine Particle Society
%D 1990
%C San Diego
%K LAllison, Monash, conf, time warp, warping, shape, particle, closed, curve,
   signature, alignment, MML, edit distance, FPS, image processing, IP, c1990
   c199x, c19xx
%X (Also see [MML].)

%A S. Ho
%A L. Allison
%A C. N. Yee
%T Restriction site mapping for three or more enzymes
%J Comp. Appl. BioSci. (J. Bioinformatics)
%V 6
%N 3
%P 195-204
%D 1990
%O TR 88/117 Dept. Comp. Sci., Monash University Oct '88
%K LAllison, Monash, jrnl, MolBio, CABIOS, RSM, restriction site map, mapping,
   constraint satisfaction problem, programming, three, separation theory,
   TR 88 117 TR117 TR88/117, c1990, c199x, c19xx, J. Bioinformatics, enzyme
%X "Restriction site mapping requires a generator to put forward possible maps &
   a constraint checker to reject false maps. Ideally these combine to give an
   algorithm which calculates a sound & complete solution set. 3 algorithms for
   generation are presented & compared. Two decompose a multi-enzyme problem
   (3+) into subproblems. The constraint checker is based on separation theory.
   Some insights into the extent of constraint checking involved in & the
   feasibility of more checking for three or more enzymes are discussed. The
   trade-off between comp'n time & the soundness of the soln set is examined."
   [now J. Bioinformatics]
   -- [doi:10.1093/bioinformatics/6.3.195],
   Also see [mapping].

%A L. Allison
%A Du Xiaofeng
%T Relating three strings by minimum message length encoding (abstract)
%P 13
%J International Conference on Genes, Proteins and Computers
%W Chester
%I SERC Daresbury Laboratory
%D 1990
%K LAllison, Monash, conf, MolBio, multiple, three, triple, alignment,
   LCS, LCSS, MML, family, evolutionary phylogenetic tree, bioinformatics,
   inductive inference, II, DNA, c1990, c199x, c19xx
%X also see [Bioinformatics].

%A L. Allison
%A C. S. Wallace
%A C. N. Yee
%T Finite-state models in the alignment of macro-molecules
%J J. Mol. Evol.
%V 35
%N 1
%P 77-89
%D 1992
%K LAllison, jrnl, MolBio, c1992, c199x, c19xx, TR 90/148, macromolecules,
   TR90/148, TR148, 148 inductive inference, II, DNA, bioinformatics, DPA,
   dynamic programming algorithm, mutual information, ML, string, sequence,
   comparison, alignment, minimum message length encoding, MML, FSM, FSA,
   finite state model, analysis, minimum description length, MDL, methods,
   Hidden Markov model, HMM, homology, similarity, LCS, LCSS, significance,
   evolutionary, edit distance, sequence, r-theory, linear, gap, indel, insert,
   delete, Algorithm, Time, Speed, JME, AAAI, Bayes, Bayesian, CSWallace, CSW
%O An extended abstract titled:
      Inductive inference over macro-molecules
      in joint sessions at AAAI Symposium, Stanford MAR 1990
      on  (i) Artificial Intelligence and Molecular Biology, p5-9,
      &  (ii) Theory and Application of Minimal-Length Encoding, p50-54,
   also an early version in Technical Report 90/148,
      Dept. Comp. Sci., Monash U., Australia 3168.
%X  MML encoding is a technique of inductive inference with theoretical and
   practical advantages.  It allows the posterior odds-ratio of two theories
   or hypotheses to be calculated.  Here it is applied to the problem of
   aligning or relating two strings, in particular biological macro-molecules.
   We compare the r-theory, that the strings are related, with the null-theory,
   that they are not related. If they are related the probabilities of the
   various alignments can be calculated.  This is done for the one-, three-
   and five-state models of relation or mutation. These correspond to linear
   and piecewise linear cost functions on runs of indels.  We describe how
   to estimate the parameters of a model.  The validity of a model is itself
   a hypothesis and can be tested objectively.  This is done on real DNA
   and on artificial data. The tests on artificial data indicate limits on
   what can be inferred in various situations.  The tests on real DNA support
   either the three- or the five-state models over the one-state model.
   Finally, a fast, approximate minimum message length string comparison
   algorithm is described.
   -- [doi:10.1007/BF00160262]['07].
      [reprint] and software,
   See  C. S. Wallace  &  D.M Boulton
        An information measure for classification.
        CompJ 11(2) 185-194 Aug '68   (appendix)
        for the derivation of the coding scheme for multi-state data.
   See also (i) Bishop  &  Thompson
           (ii) Thorne,  Kishino  &  Felsenstein, and

%A L. Allison
%A C. S. Wallace
%A C. N. Yee
%T When is a string like a string?
%J Int. Symposium on Artificial Intelligence and Mathematics
%W Ft. Lauderdale, Florida, USA
%D 1990
%K LAllison, CSW, CSWallace, Monash, conf, inductive inference, II, homology,
   alignment, LCS, edit distance, string, sequence, comparison, similarity,
   r-theory, macro-molecule, MolBio, DNA, uncertainty, pattern matching, MML,
   minimum message length encoding, AIM AIM90, Hidden Markov model, HMM,
   c1990, c199x, c19xx, bioinformatics
%X [more],
   also see [TR90/148](html)
        and [TR90/148](.ps).

%A L. Allison
%A C. N. Yee
%T Minimum message length encoding and the comparison of macromolecules
%J Bulletin of Mathematical Biology
%V 52
%N 3
%D 1990
%P 431-453
%O TR 89/126 Computer Science, Monash University, MAY '89
%K LAllison, Monash, jrnl, minimum message length encoding, MML, c1990, c199x,
   c19xx, MDL, ML, inductive inference, II, minimum description length, DNA,
   approximate, alignment, similarity, homology, LCS, LCSS, pattern matching,
   string, sequence, comparison, Bayesian, Hidden Markov model, HMM,
   mutual information, MolBio, bioinformatics, BMB, TR 89/126, 89 126, TR89/126
%X "A method of inductive inference known as minimum message length encoding is
   applied to string comparison in molecular biology. The question of whether or
   not two strings are related and, if so, of how they are related and the
   problem of finding a good theory of string mutation are treated as inductive
   inference problems. The method allows the posterior odds-ratio of two string
   alignments or of two models of string mutation to be computed. The
   connection between models of mutation and existing string alignment
   algorithms is made explicit. A fast minimum message length alignment
   algorithm is also described."

%A C. McDonald
%A L. Allison
%T Denotational semantics of a command interpreter and their implementation
   in standard ML
%V 32
%N 5
%P 422-431
%D 1989
%O TR 88/7, Dept. Computer Science, UWA  '88
%K LAllison, Monash, COMPJ, jrnl, functional programming, FP, c1989, c198x,
   c19xx, shell script, scripting, command language, line, unix,
   operating system, OS, denotational semantics, DS, ML, SML, UWA,
   TR 88/7, TR88/7, 88 7
%X  [doi:10.1093/comjnl/32.5.422]['11],
    [Denotational Semantics].

%A L. Allison
%T Continuations implement generators and streams
%V 33
%N 5
%P 460-465
%D 1990
%O TR 88/112 Dept. Computer Science, Monash University AUG '88
%K LAllison, Monash, jrnl, continuation, combinator, generator, nondeterminism,
   non-determinism, nqueens, n queens, n-queens, source, stream, sink, lazy,
   agent, sieve, TR 88/112, TR88/112, TR112, c1988, c198x, c19xx,
   functional programming, FP
%X  Continuations are used to program non-deterministic generators
   and a variation on deterministic stream functions.
   Examples include the n-queens problem and the sieve of Eratosthenese.
   The continuation style has an imperative flavour but is
   functional and free of side effects.
   Some (recursive) programs can be executed in constant space.
   -- [doi:10.1093/comjnl/33.5.460]['11],

%A L. Allison
%A C. N. Yee
%A M. McGaughey
%T Three-dimensional queens problems
%R TR 89/130
%I Dept. Computer Science, Monash University, Australia
%D 1989
%K LAllison, Monash, n queens, n-queens, queen, nqueens, chess, 3D,
   TR 89 130 TR130 TR89/130, c1989, c198x, c19xx
%X [TR89/130],
   [TR89/130 (HTML)],
   [TR89/130 (.ps)].

%A L. Allison
%A C. Y. Yee
%T Restriction site mapping is in separation theory
%J Comp. Appl. BioSci. (J. Bioinformatics)
%V 4
%N 1
%P 97-101
%D 1988
%K LAllison, Monash, jrnl, CABIOS, plasmid, map, maps, mapping, combinatorial,
   enzyme, restriction site, DNA, phage, RSM, MolBio, separation theory,
   linear, constraint, programming, algorithm, inequality, c1988, c198x, c19xx,
   J. Bioinformatics, NAR, Nucl. Acids Res. special issue
%X Paper examines constraint checking during backtracking generation
   of solutions to the plasmid mapping problem.
   The constraints fall into Vaughn Pratt's separation theory.
   It is necessary and sufficient to check all simple cycles
   in the graph of restriction sites linked by fragments
   from the two single digests and the one double digest (can be generalized
   to more than two enzymes).
   In general, at level 'n' there are at most n new constraints to check.
      As an efficiency matter, only those cycles containing 1, 2 or 3
   indivisible cycles can be checked which allows a very few false maps to
   be generated.  This reduces total constraints/map from n*n to 3*n.
   -- [more],
      [paper (HTML)],
      [paper (.ps)].
   [jrnl is now  J. Bioinformatics].

%A L. Allison
%T A Practical Introduction to Denotational Semantics
%S Cambridge Computer Science Texts
%K LAllison, UWA, text, book, textbook, CUP, denotational semantics, DS,
   programming language, languages, definition, L4REF, c1986, c198x, c19xx
%D 1986
%X [doi:10.1017/CBO9781139171892];
   pb, uk us isbn:0521314232, uk us isbn13:978-0521314237;
   hb, uk us isbn:0521306892, uk us isbn13:978-0521306898;
   reviewed in Computing Reviews, Nov '87 pp.574-5 and Jul '88 pp.349-350
   and in Software P & E 18(5) pp.493-494 May '88 [www]['07].
   (More on [Denotational Semantics].)

%A L. Allison
%T Direct semantics and exceptions define jumps and coroutines
%V 31
%N 6
%P 327-330
%D 1989
%O TR 87/90 Dept Computer Science, Monash University, 1987
%K LAllison, Monash, jrnl, denotational semantics, DS, direct, exception,
   SML, ML, jump, sequencer, goto, coroutine, control, raise, handle, catch,
   throw, IPL, TR 87/90 TR87/90 90 TR90, c1989, c198x, c19xx
%X Direct semantics and continuation semantics are the two main styles of
   denotational semantics. Direct semantics is the simpler style but cannot
   define the semantics of jumps and other sequencers. This paper shows that,
   with the addition of exceptions, direct semantics can define sequencers. In
   contrast to the use of continuation semantics, nothing need be added to the
   semantics of commands unconnected with sequencers. Since exceptions can be
   defined in terms of the lambda-calculus nothing need be added to the
   foundations of semantics.
   See [TR87/90][HTML],
   [Denotational Semantics].

%A L. Allison
%T Circular programs and self-referential structures
%V 19
%N 2
%P 99-109
%D 1989
%O in TR87/91 Dept Computer Science, Monash University, Jan 87
%K jrnl, LAllison, Monash, applicative, functional programming, FP, circular,
   recursive, recursion, lazy, evaluation, need, program, list, lists, queue,
   doubly linked, tree, trees, breadth first traversal, BFT, threaded, queues,
   nub, unique elements, primes, c1989, c198x, c19xx, TR 87/91 TR87/91 TR91 91
%X  A circular program creates a data structure whose computation depends on
   itself or refers to itself.  The technique is used to implement the
   classic data structures circular and double-linked lists, threaded trees
   and queues in a functional programming language.
   -- [more_1],
   [Also search for: FP circular program].

%A L. Allison
%T Some applications of continuations
%V 31
%N 1
%P 9-11
%D 1988
%O in TR87/91 Dept Computer Science, Monash University, Jan 87
%K LAllison, Monash, jrnl, functional programming, FP, recursion, backtrack,
   COMPJ, backtracking, back track, coroutine, applicative, continuation, parse,
   parser, parsing, recursive descent, non-determinism, binary search tree,
   merge, combinators, TR 87/91 TR87/91 TR91 91, c1988, c198x, c19xx
%X [doi:10.1093/comjnl/31.1.9]['11],

%A L. Allison
%A T. I. Dix
%T A bit-string longest-common-subsequence algorithm
%V 23
%D 1986
%P 305-310
%K LAllison, TIDix, Monash, UWA, jrnl, MolBio, LCS, LCSS, c1986, c198x, c19xx,
   fast, algorithm, bits string, bit vector, DPA, dynamic programming algorithm,
   similarity, homology, bioinformatics, IPL, sequence, approximate, match,
   matching, strings, distance, comparison, alignment, practical, speedup,
%X "A longest-common-subsequence algorithm is described which operates in terms
   of bit or bit-string operations. It offers a speedup of the order of the
   word-length on a conventional computer."
   Speedup is ~ wordlength (eg. a factor of 32 or 64),  time is O(n^2/wordlength).
   [more]+source code,

%A A. Finlay
%A L. Allison
%T A prescription for Prolog control features based upon denotational semantics
   and its implementation in standard ML
%R TR 87/97
%I Dept. Computer Science, Monash University
%D 1987
%K LAllison, Monash, CSSE, Prolog, logic programming, LP, SML, ML, TR87/97,
   TR97 TR 97, c1987, c198x, c19xx, denotational semantics, DS
%X [Logic Programming]
   [Denotational Semantics].

%A L. Allison
%T Programming denotational semantics II
%V 28
%N 5
%P 480-486
%D 1985
%K COMPJ, jrnl, LAllison, UWA, denotational, DS, continuation semantics,
   algol, A68, algol-68, algol68, c1985, c198x, c19xx
%X [paper]
   [Denotational Semantics].

%A L. Allison
%T Programming denotational semantics
%V 26
%N 2
%P 164-174
%D 1983
%K COMPJ, jrnl, LAllison, UWA, direct, denotational semantics,
   DS, Pascal, c1983, c198x, c19xx
%X [paper],
   [Denotational Semantics].

%A L. Allison
%T An executable Prolog semantics
%J Algol Bulletin
%N 50
%P 10-18
%D 1983
%K LAllison, UWA, jrnl, denotational semantics, DS, algol, algol-68, algol68,
   logic programming, LP, unification, interpreter, Prolog, c1983, c198x, c19xx
%X [dl.acm.org/doi/10.5555/1061790.1061794] -- AB in ACM digital archive.
   Also see [program],
   and [prolog].

%A L. Allison
%T On syntax directed program editing
%V 13
%P 453-465
%D 1983
%K LAllison, UWA, jrnl, edit, editor, syntax-editor, SED, SDE,
   syntax directed, program, structure based, c1983, c198x, c19xx
%X  .

%A L. Allison
%T Stable-marriages by coroutines
%V 16
%P 61-65
%D 1983
%K IPL, jrnl, c1983, c198x, c19xx, LAllison, UWA, jrnl, stable marriage problem,
   algorithm, match, coroutine, modula, two, modula2, modulaII, bipartite,
   graph, match, matching, preference, coroutine
%X The stable marriage problem is an appealing version of many pairing problems.
   A solution by coroutines is given, based on the recursive algorithm of
   McVitie and Wilson (1971). There are few published algorithms where
   coroutines are really useful but they solve this problem very naturally.
   -- [doi:10.1016/0020-0190(83)90025-X]['06],
   [LA home].

%A L. Allison
%A P. Edmiston
%T A LOGO survey
%J Australian Computer Bulletin
%V 5
%N 9
%P 40
%D 1981
%K LAllison, UWA, jrnl, logo, programming language,
   children school education CAI, c1981, c198x, c19xx

%A L. Allison
%T Generating coset representatives for permutation groups
%J J. of Algorithms
%V 2
%P 227-244
%D 1981
%K LAllison, UWA, jrnl, coset, group, permutation, backtrack, backtracking,
   back track, isomorph rejection, subgroup, generate, representative,
   algorithm, c1981, c198x, c19xx
%X  .
   -- [doi:10.1016/0196-6774(81)90024-9]['10].

%A L. Allison
%T Phrase structures, non-determinism and backtracking
%V 7
%N 3
%P 139-143
%D 1978
%K LAllison, MelbU, jrnl, grammar, parse, parsing, parser,  backtrack,
   backtracking, non-determinism, back track, back tracking,
   LL, LL1, LL(1), c1978, c197x, c19xx
%X  -- [doi:10.1016/0020-0190(78)90077-7]['06],

%A L. Allison
%A A. Lock
%T A wordprocessor for the handicapped
%W Perth
%D 1982
%K LAllison, UWA, conf, ACSC5, ACSC82, c1982, c198x, c19xx

%A L. Allison
%A D. Wilde
%T Phrase structures in Pascal
%J Programming Language Systems
%W Canberra
%P 29-37
%D 1977
%I ANUpress
%K LAllison, MelbU, conf, grammar,  parse, parsing, parser,  backtrack,
   backtracking, back track, c1977, c197x, c19xx, Pascal, LL LL1 LL(1), ACSC,
   ACSC0, structure, system
%X Essentially ACSC0, conf 24-25 Feb 1977,
   volume published in '78, isbn:0708104932.
   Also see [Programming-Languages].

Coding Ockham's Razor, L. Allison, Springer

A Practical Introduction to Denotational Semantics, L. Allison, CUP

free op. sys.
free office suite
~ free photoshop
web browser

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Wednesday, 08-Feb-2023 12:08:35 AEDT.