EXTRACTING AND SUMMARIZING THE CONTENT OF VIETNAMESE WEB PAGES

Do Phuc; Ho Anh Thu

doi:10.32508/stdj.v8i10.3076

Article

EXTRACTING AND SUMMARIZING THE CONTENT OF VIETNAMESE WEB PAGES

Do Phuc ¹

Ho Anh Thu ¹

Volume & Issue: Vol. 8 No. 10 (2005) | Page No.: 13-22 | DOI: 10.32508/stdj.v8i10.3076

Published: 2005-10-31

Abstract

Document summarization is to make an abridge version of document or a set of documents. The problem of document summarization has a long history of development. The first work in this research direction has belonged to Luhn since 1958. With the increase of volume of information in the Internet especially the Web pages in English or in Vietnamese language Web pages, the problem of developing the narization techniques which can help to summarize the content of web pages or documents has been the interest of researchers. In this paper, we would like to present the results of building a summarization system for summarizing the content of Vietnamese web pages based on the extraction of salience sentences from original document. We use the natural language processing such as word segmentation, POS tagging, compound noun extracting for increasing the efficiency of document summarization and opening a solution for semantic text summarization.

VNUHCM Journal of

Science and Technology Development

EXTRACTING AND SUMMARIZING THE CONTENT OF VIETNAMESE WEB PAGES

Online metrics

Statistics from the website

Statistics from Dimensions

Statistics from PlumX

Abstract

Comments