^Seminar^ [Bioinformatics]

details of testing

...mutations may [be] back mutations...
As to the P(change) parameter, that is simply to control the ratio of change mutations to indel mutations. ...

... The artificial data was generated using a number of different population models. In all cases, 10 parent sequences were generated from the population model(s). Each parent sequence was generated to be of the form gen(50±50)++gen(120±30)++gen(100±50). Where gen(x±y) means to generate a sequence (section) with length selected uniform randomly from the range [x-y, x+y]. Then from each parent, 25 child sequences were generated having only the (mutated) middle sequence (section) in common. Of these 25 children, 5 were generated by making 30 mutations, 5 by making 40 mutations, 5 with 50 mutations, 5 with 60 mutations and 5 with 80 mutations. The exact method used for these mutations is interesting [and ensures that the mutants still fit the population model].

The child sequences were considered as the library to be searched, and each parent was used in turn as the query sequence. Thus for each query there were 25 related sequences of differing relatedness and 225 [=9×25] unrelated sequences. This ratio of related to unrelated sequence in the library is high compared to real sequence databases but will suffice for testing purposes. Each parent sequence was compared against every child sequence making 2500 pair-wise comparisons. Of these 2500 comparisons, 250 are between related sequences, and 2250 between unrelated sequences. ...

-- D.P., 4 August 2004

Created with "vi (Linux & Solaris)", charset=iso-8859-1