RESEARCH ON APPLICATION OF FREQUENT SETS AND ASSOCIATION RULES TO SEMANTIC VIETNAMESE DOCUMENT CLASSIFICATION

Do Phuc

doi:10.32508/stdj.v9i2.2880

Article

RESEARCH ON APPLICATION OF FREQUENT SETS AND ASSOCIATION RULES TO SEMANTIC VIETNAMESE DOCUMENT CLASSIFICATION

Do Phuc ¹

Volume & Issue: Vol. 9 No. 2 (2006) | Page No.: 23-32 | DOI: 10.32508/stdj.v9i2.2880

Published: 2006-02-28

Abstract

Today, the volume of electronic documents in the Internet is really huge. Therefore, the issue of developing the classification algorithms which can work effectively with large data set is a research direction of text mining. In this paper, we would like to present some results of the application of frequent sets and association rules to the document classification problem. We have applied these algorithms in i) Using the frequent sets and association rules for generating the document feature vectors, and ii) Using the association rules for classifying the documents. In the problem (i) the frequent set discovery algorithm has been improved to find the frequent terms in the corpus and document. After that, the natural language processing algorithms has been used for POS tagging and discovering the noun phrases. Besides, the association rules have been used to build the co-occurrence term graph in a particular context supporting to determine the word sense and the adjustment of the similar meaning components of document feature vector. In problem (ii), the association rules are used to generate the classification rules. The proposed system was tested with the data set of abstracts of papers in IT field.

VNUHCM Journal of

Science and Technology Development

RESEARCH ON APPLICATION OF FREQUENT SETS AND ASSOCIATION RULES TO SEMANTIC VIETNAMESE DOCUMENT CLASSIFICATION

Online metrics

Statistics from the website

Statistics from Dimensions

Statistics from PlumX

Abstract

Comments