Open Access

Downloads

Download data is not yet available.

Abstract

In isolating languages such as Chinese and Vietnamese, words are not separated by spaces, a word can include one or more spelling words. Segmenting word or not before training and translating process is a problem that need to be considered. In this paper, we will survey the effect of word boundary factor in the translation result of Chinese-Vietnamese statistical machine translation (SMT). The experimental result of this paper will be the basis for word segmentation improvement in future research which increase machine translation performance. We surveyed on two experiments: word segmentation (WS) and word un-segmentation (WUS) on the corpus of 8,000 and 12,000 sentence pairs. Based on the experimental results, we found that both of WS corpus and WUS corpus have their own advantages and defects. We propose integrating the advantages of these two methods in SMT



Author's Affiliation
Article Details

Issue: Vol 18 No 2 (2015)
Page No.: 70-78
Published: Jun 30, 2015
Section: Natural Sciences - Research article
DOI: https://doi.org/10.32508/stdj.v18i2.1133

 Copyright Info

Creative Commons License

Copyright: The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

 How to Cite
Tran, P., & Dinh, D. (2015). Surveying word boundary factor in Chinese - Vietnamese statistical machine translation. Science and Technology Development Journal, 18(2), 70-78. https://doi.org/https://doi.org/10.32508/stdj.v18i2.1133

 Cited by



Article level Metrics by Paperbuzz/Impactstory
Article level Metrics by Altmetrics

 Article Statistics
HTML = 1481 times
Download PDF   = 789 times
Total   = 789 times