Article Open Access Logo

DEVELOPING ALGORITHMS FOR FINDING THE SIMILAR MOTIF IN DNA SEQUENCES

Hoang Kiem 1
Do Phuc 1
Volume & Issue: Vol. 3 No. 7&8 (2000) | Page No.: 5-11 | DOI: 10.32508/stdj.v3i7&8.3573
Published: 2000-08-31

Online metrics


Statistics from the website

  • Abstract Views: 1578
  • Galley Views: 584

Statistics from Dimensions

Copyright The Author(s) 2023. This article is published with open access by Vietnam National University, Ho Chi Minh city, Vietnam. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0) which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. 

Abstract

The problem of discovering the similar sub-sequences in a set of DNA biological sequences is an important problem of bio-technology. From the positions of similar sub sequences, we can discover the features of a group of similar function genes or the position for point mutation. In this paper, we analyze two problems of repeated and upproximate sub sequence discovery and employ the large set discovery algorithm for discovering the repeated sub-sequence and the genetic algorithm for discovering the approximate sub-sequence in a set of DNA biological sequence. In the repeated sub-sequence discovery algorithm, we consider the repeated sub-sequence as a large set and employed the Apriori-TiD algorithm (Agrawal , 1994) for discovering the maximal large set. In the approximate sub-sequence discovery algorithm, we consider the chromosome of genetic algorithm as a potential solution and employ the genetic algorithm for selecting the right solution. Two proposed algorithms work very well with large data set, therefore they satisfy the demand of the large data set of genes. Besides, we also propose a heuristic for improving the speed of solution discovery. We apply our proposed algorithms to the data of the promoters from the University of Irvine, USA and show the experiment results.

Comments