Engineering and Technology - Research article Open Access Logo

Multitask learning based on attention and transformer mechanism for event recognition and importance image prediction in photo albums

Viet Hoai Vo 1, *
Viet Quoc Le 1
  1. Computer Vision Department, University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Correspondence to: Viet Hoai Vo, Computer Vision Department, University of Science, VNU-HCM, Ho Chi Minh City, Vietnam. Email: vhviet@fit.hcmus.edu.vn.
Volume & Issue: Vol. 29 No. 1 (2026) | Page No.: 3911-3918 | DOI: 10.32508/stdj.v29i.4361
##BioMedPressTheme.submission.updatedOn##

Online metrics


Statistics from the website

  • Abstract Views: 1599
  • Galley Views: 631

Statistics from Dimensions

This article is published with open access by Viet Nam National University, Ho Chi Minh City, Viet Nam. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0) which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Abstract

Capturing images has become a simple task due to the popularity and technological advancements in cameras and cellphones. As the quantity of pictures being taken increases, the task of organiz- ing them while preserving their significance is becoming more challenging. Solving this problem requires creating a system that can identify the type of album, select the important photos to store, and automatically delete the rest. Such a system could also significantly reduce the storage require- ments and create attractive story videos. In this study, we design a multitask network architecture that can simultaneously learn event recognition and image importance, thereby preventing the need for event-type information. This approach combines the strengths of convolution neural net- works for image description with an attention and transformer mechanism for album description to perform both event recognition and image significance determination, providing a viable and effective approach with faster prediction times for both image importance and event identifica- tion. Our approach surpasses state-of-the-art methods by improving 3% on image importance tasks and achieving 67.21% accuracy on event recognition tasks in the ML-CUFED dataset. The re- sults are evaluated on multiple backbones and parameters to demonstrate the generalization of the proposed methodology.

Comments