DEVELOPING A MOTIF BASED CLUSTERING ALGORITHM FOR SUPPORTING THE QUERY IN DATABASE OF DNA SEQUENCES
Abstract
We have developed a system for supporting the query in a database of DNA sequences. We would like to develop a system for grouping similar DNA sequences into clusters based on frequent motifs or motif phrases. Each cluster is represented by a cluster feature vector of maximal frequent motifs or motif phrases). A motif tree of cluster features is built. The similarity search will be divided into two steps. Firstly, the system will search the clusters which have the high matching with the query pattern. Secondly, the traditional matching techniques (FASTA or BLAST) will be used for matching between pattern and a small number of DNA sequences of selected cluster.