Computational Biology

All living organisms are based on genetic material, genetic material that is specific for the organism. This material - that inexplicably can be regarded as sequences - can thus be a valuable source of information, both concerning the basic processes of life and the relationships among different species. During the last twenty years the ability to `read' this genetic material has increased tremendously. This development has led to an explosion in available data.

But all this data is of little use if methods are not available for deducing information from it. Simply by the sheer amount of data, it is of vital importance that these methods are automated to a very large degree. This demand has spawned the field of computational biology, a field that from a computer science point of view offers novel problems as well as variants over known problems.

In this dissertation we focus on problems directly related to the biological sequences. That is, problems concerned with

detecting patterns in one sequence,
comparing two or more sequences,
inferring structural information about the biomolecule represented by a given sequence.

We discuss the modelling aspects involved when solving real world problems, and look at several models from various areas of computational biology. This includes examining the complexity of and developing efficient algorithms for some selected problems in these models.

The dissertation includes five articles, four of which have been previously published, on

finding repetitions in a sequence with restrictions on the distance between the repetitions in the sequence;
comparing two coding DNA sequences, that is, a comparison of DNA sequences where the encoded protein is taken into account;
comparing two hidden Markov models. Hidden Markov models are often used as representatives of families of evolutionary or functionally related sequences;
inferring the secondary structure of an RNA sequence, that is, the set of base pairs in the three dimensional structure, based on a widely accepted energy model;
an approximation algorithm for predicting protein structure in a simple lattice model.

These articles contain the technical details of the algorithms discussed in the first part of the dissertation

Computational Biology

Abstract: