Downloads
Abstract
Today, the volume of electronic documents in the Internet is really huge. Therefore, the issue of developing the classification algorithms which can work effectively with large data set is a research direction of text mining. In this paper, we would like to present some results of the application of frequent sets and association rules to the document classification problem. We have applied these algorithms in i) Using the frequent sets and association rules for generating the document feature vectors, and ii) Using the association rules for classifying the documents. In the problem (i) the frequent set discovery algorithm has been improved to find the frequent terms in the corpus and document. After that, the natural language processing algorithms has been used for POS tagging and discovering the noun phrases. Besides, the association rules have been used to build the co-occurrence term graph in a particular context supporting to determine the word sense and the adjustment of the similar meaning components of document feature vector. In problem (ii), the association rules are used to generate the classification rules. The proposed system was tested with the data set of abstracts of papers in IT field.
Issue: Vol 9 No 2 (2006)
Page No.: 23-32
Published: Feb 28, 2006
Section: Article
DOI: https://doi.org/10.32508/stdj.v9i2.2880
Download PDF = 311 times
Total = 311 times