Comparative Analysis of Long DNA Sequences by Per Element Information Content Using Different Contexts

Trevor I. Dix, David R. Powell, Lloyd Allison, Julie Bernal, Samira Jaeger, Linda Stern

home1 home2
 Bib
 Algorithms
 Bioinfo
 FP
 Logic
 MML
 Prog.Lang
and the
 Book

Bioinformatics
 Compression

BMC Bioinformatics, 8(Suppl.2):S10, May 2007.

Abstract: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models.

Results: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2.

Conclusions: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.

Trevor I. Dix, David R. Powell, Lloyd Allison, Julie Bernal, Samira Jaeger, Linda Stern.
Comparative analysis of long DNA sequences by per element information content using different contexts.
BMC Bioinformatics, 8(Suppl 2):S10, May 2007, [doi:10.1186/1471-2105-8-S2-S10].
Also see:
[DCC '07], [Compression], [Bioinformatics],
and a related [seminar].
Coding Ockham's Razor, L. Allison, Springer

A Practical Introduction to Denotational Semantics, L. Allison, CUP

Linux
 Ubuntu
free op. sys.
OpenOffice
free office suite
The GIMP
~ free photoshop
Firefox
web browser

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Friday, 29-Mar-2024 00:30:14 AEDT.