Article Open Access Logo

EXTRACTING AND SUMMARIZING THE CONTENT OF VIETNAMESE WEB PAGES

Do Phuc 1
Ho Anh Thu 1
Volume & Issue: Vol. 8 No. 10 (2005) | Page No.: 13-22 | DOI: 10.32508/stdj.v8i10.3076
Published: 2005-10-31

Online metrics


Statistics from the website

  • Abstract Views: 2136
  • Galley Views: 922

Statistics from Dimensions

Copyright The Author(s) 2023. This article is published with open access by Vietnam National University, Ho Chi Minh city, Vietnam. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0) which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. 

Abstract

Document summarization is to make an abridge version of document or a set of documents. The problem of document summarization has a long history of development. The first work in this research direction has belonged to Luhn since 1958. With the increase of volume of information in the Internet especially the Web pages in English or in Vietnamese language Web pages, the problem of developing the narization techniques which can help to summarize the content of web pages or documents has been the interest of researchers. In this paper, we would like to present the results of building a summarization system for summarizing the content of Vietnamese web pages based on the extraction of salience sentences from original document. We use the natural language processing such as word segmentation, POS tagging, compound noun extracting for increasing the efficiency of document summarization and opening a solution for semantic text summarization.

Comments