Feature extraction semilearning and augmented representation for image captioning in crowd scenes

Khang Tan Tran Minh Nguyen

doi:10.32508/stdj.v26i4.4028

Article
Details
Citation
Metrics

Open Access

Downloads

Download data is not yet available.

Abstract

Image captioning has been an interesting task since 2015. The topic lies in the gap between Computer Vision and Natural Language Processing research directions. The problem can be described as follows: Given the input as a three-channel RGB image, a language model is trained to generate the hypothesis caption that describes the images’ contexts. In this study, we focus on solving image captioning in images captured in a crowd scene, which is more complicated and challenging. In general, a semilearning feature extraction mechanism is proposed to obtain more valuable high-level feature maps of images. Moreover, an augmented approach in the Transformer Encoder is explored to enhance the representation ability. The obtained results are promising and outperform those of other state-of-the-art captioning models on the CrowdCaption dataset.

Comments

Author's Affiliation

Khang Tan Tran Minh Nguyen

University of Information Technology, Viet Nam National University Ho Chi Minh City, Viet Nam

Email I'd for correspondance: khangnttm@uit.edu.vn
Google Scholar Pubmed

Article Details

Issue: Vol 26 No 4 (2023)

Page No.: 3128-3138

Published: Dec 31, 2023

Section: Section: ENGINEERING AND TECHNOLOGY

DOI: https://doi.org/10.32508/stdj.v26i4.4028

Copyright: The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

How to Cite

Nguyen, K. (2023). Feature extraction semilearning and augmented representation for image captioning in crowd scenes. VNUHCM Journal of Science and Technology Development, 26(4), 3128-3138. https://doi.org/https://doi.org/10.32508/stdj.v26i4.4028

Download Citation

Cited by

Article level Metrics by Paperbuzz/Impactstory

Article level Metrics by Altmetrics

Article Statistics

HTML = 1689 times
PDF = 548 times
Total = 548 times

VNUHCM Journal of

Science and Technology Development

An official journal of Viet Nam National University Ho Chi Minh City, Viet Nam since 1997

ISSN 1859-0128

HTML

1689

Total

548

Citations

Share

Feature extraction semilearning and augmented representation for image captioning in crowd scenes

Khang Tan Tran Minh Nguyen

Downloads

Abstract

Khang Tan Tran Minh Nguyen

Journal Manager & Editor-in-Chief

INFORMATION

FOR AUTHORS

CONTACT US

VNUHCM Journal of

Science and Technology Development

An official journal of Viet Nam National University Ho Chi Minh City, Viet Nam since 1997

ISSN 1859-0128

HTML1689 Total 548 Citations Share Feature extraction semilearning and augmented representation for image captioning in crowd scenes

Khang Tan Tran Minh Nguyen

Downloads

Abstract

Khang Tan Tran Minh Nguyen

Journal Manager & Editor-in-Chief

INFORMATION

FOR AUTHORS

CONTACT US

HTML

1689

Total

548

Citations

Share

Feature extraction semilearning and augmented representation for image captioning in crowd scenes