All living organisms are based on genetic material, genetic
material that is specific for the organism. This material - that
inexplicably can be regarded as sequences - can thus be a valuable source of
information, both concerning the basic processes of life and the
relationships among different species. During the last twenty years the
ability to `read' this genetic material has increased tremendously. This
development has led to an explosion in available data.
But all this
data is of little use if methods are not available for deducing information
from it. Simply by the sheer amount of data, it is of vital importance that
these methods are automated to a very large degree. This demand has spawned
the field of computational biology, a field that from a computer
science point of view offers novel problems as well as variants over known
problems.
In this dissertation we focus on problems directly related
to the biological sequences. That is, problems concerned with
- detecting patterns in one sequence,
- comparing two or more
sequences,
- inferring structural information about the biomolecule
represented by a given sequence.
We discuss the modelling
aspects involved when solving real world problems, and look at several models
from various areas of
computational biology. This includes examining
the complexity of and developing efficient algorithms for some selected
problems in these models.
The dissertation includes five articles,
four of which have been previously published, on
- finding repetitions in a sequence with restrictions on the distance between
the repetitions in the sequence;
- comparing two coding DNA sequences,
that is, a comparison of DNA sequences where the encoded protein is taken
into account;
- comparing two hidden Markov models. Hidden Markov models
are often used as representatives of families of evolutionary or functionally
related sequences;
- inferring the secondary structure of an RNA
sequence, that is, the set of base pairs in the three dimensional structure,
based on a widely accepted energy model;
- an approximation algorithm for
predicting protein structure in a simple lattice model.
These
articles contain the technical details of the algorithms discussed in the
first part of the dissertation