Section: ENGINEERING AND TECHNOLOGY Open Access Logo

Improving Vietnamese Fake News Detection based on Contextual Language Model and Handcrafted Features

Khoa Dang Pham 1, *
Dang Van Thin 1
Ngan Luu-Thuy Nguyen 1
  1. University of Information Technology, Ho Chi Minh city, Vietnam Vietnam National University, Ho Chi Minh city, Vietnam
Correspondence to: Khoa Dang Pham, University of Information Technology, Ho Chi Minh city, Vietnam Vietnam National University, Ho Chi Minh city, Vietnam. Email: 18520930@gm.uit.edu.vn.
Volume & Issue: Vol. 26 No. 2 (2023) | Page No.: 2705-2712 | DOI: 10.32508/stdj.v26i1.3927
Published: 2023-06-30

Online metrics


Statistics from the website

  • Abstract Views: 1585
  • Galley Views: 711

Statistics from Dimensions

This article is published with open access by Viet Nam National University, Ho Chi Minh City, Viet Nam. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0) which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Abstract

Introduction: In recent years, the rise of social networks in Vietnam has resulted in an abundance of information. However, it has also made it easier for people to spread fake news, which has done a great disservice to society. It is therefore crucial to verify the reliability of news. This paper presents a hybrid approach that uses a pretrained language model called vELECTRA along with handcrafted features to identify reliable information on Vietnamese social network sites.

Methods: The present study employed two primary approaches, namely: 1) fine-tuning the model by utilizing solely textual data, and 2) combining additional meta-data with the text to create an input representation for the model.

Results: Our approach performs slightly better than other refined BERT methods and achieves state-of-the-art results on the ReINTEL dataset published by VLSP in 2020. Our method achieved a 0.9575 AUC score, and we used transfer learning and deep learning approaches to detect fake news in the Vietnamese language using meta features.

Conclusion: With regards to the results and analysis, it can be inferred that the number of reactions a post receives, and the timing of the event described in the post are indicative of the news' credibility. Furthermore, it was discovered that BERT can encode numerical values that have been converted into text.

Sorry, we can not display full-text of this article in HTML format for you right now. Please get the article in PDF format instead.

Comments