VIETNAMESE PART-OF-SPEED TAGGING BASED ON STYLE OF TEXTS  AND PROBABILITY MODEL

Chau Quang Nguyen; Tuoi Thi Phan; Tru Hoang Cao

doi:10.32508/stdj.v9i2.2879

Article

VIETNAMESE PART-OF-SPEED TAGGING BASED ON STYLE OF TEXTS AND PROBABILITY MODEL

Chau Quang Nguyen ¹

Tuoi Thi Phan ²

Tru Hoang Cao ²

Volume & Issue: Vol. 9 No. 2 (2006) | Page No.: 11-22 | DOI: 10.32508/stdj.v9i2.2879

Published: 2006-02-28

Abstract

Accurate part-of-speech (POS) tagging for words in Vietnamese texts is very important problem. It will support for texts parsing, resolve polysemy, assist with semantic information extraction systems, etc. Therefore, this paper presents an approach to POS tagging for Vietnamese texts. This method used probability model and based on a lexicon with information about possible POS tags for each word, a manually labelled corpus, syntax and context of texts. Concurrently, we also built a corpus with 75,000 entries and a lexicon with 80,000 entries for the purpose of Vietnamese language processing research and application development.

VNUHCM Journal of

Science and Technology Development

VIETNAMESE PART-OF-SPEED TAGGING BASED ON STYLE OF TEXTS AND PROBABILITY MODEL

Online metrics

Statistics from the website

Statistics from Dimensions

Statistics from PlumX

Abstract

Comments