...mutations may [be] back mutations...
As to the P(change) parameter, that is simply to control the
ratio of change mutations to indel mutations. ...
...
The artificial data was generated using a number of
different population models.
In all cases, 10 parent sequences were generated from
the population model(s).
Each parent sequence was generated to be of the form
gen(50±50)++gen(120±30)++gen(100±50).
Where gen(x±y) means to generate a
sequence (section) with length selected uniform randomly
from the range [x-y, x+y].
Then from each parent, 25 child sequences
were generated having only the
(mutated) middle sequence (section) in common.
Of these 25 children, 5 were generated
by making 30 mutations,
5 by making 40 mutations,
5 with 50 mutations, 5 with
60 mutations and 5 with 80 mutations.
The exact method used for these mutations is interesting
[and ensures that the mutants still fit the population model].
The child sequences were
considered as the library to be searched, and
each parent was used in turn as
the query sequence. Thus for each query there were
25 related sequences of differing relatedness and
225 [=9×25] unrelated sequences.
This ratio of related to
unrelated sequence in the library is high compared to real sequence
databases but will suffice for testing purposes.
Each parent sequence was compared against every
child sequence making 2500 pair-wise comparisons. Of
these 2500 comparisons, 250 are
between related sequences, and
2250 between unrelated sequences. ...
-- D.P., 4 August 2004