Open Access

Downloads

Download data is not yet available.

Abstract

Today, the volume of electronic documents in the Internet is really huge. Therefore, the issue of developing the classification algorithms which can work effectively with large data set is a research direction of text mining. In this paper, we would like to present some results of the application of frequent sets and association rules to the document classification problem. We have applied these algorithms in i) Using the frequent sets and association rules for generating the document feature vectors, and ii) Using the association rules for classifying the documents. In the problem (i) the frequent set discovery algorithm has been improved to find the frequent terms in the corpus and document. After that, the natural language processing algorithms has been used for POS tagging and discovering the noun phrases. Besides, the association rules have been used to build the co-occurrence term graph in a particular context supporting to determine the word sense and the adjustment of the similar meaning components of document feature vector. In problem (ii), the association rules are used to generate the classification rules. The proposed system was tested with the data set of abstracts of papers in IT field.



Author's Affiliation
Article Details

Issue: Vol 9 No 2 (2006)
Page No.: 23-32
Published: Feb 28, 2006
Section: Article
DOI: https://doi.org/10.32508/stdj.v9i2.2880

 Copyright Info

Creative Commons License

Copyright: The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

 How to Cite
Phuc, D. (2006). RESEARCH ON APPLICATION OF FREQUENT SETS AND ASSOCIATION RULES TO SEMANTIC VIETNAMESE DOCUMENT CLASSIFICATION. Science and Technology Development Journal, 9(2), 23-32. https://doi.org/https://doi.org/10.32508/stdj.v9i2.2880

 Cited by



Article level Metrics by Paperbuzz/Impactstory
Article level Metrics by Altmetrics

 Article Statistics
HTML = 1133 times
Download PDF   = 326 times
Total   = 326 times