Multitask learning based on attention and transformer mechanism for event recognition and importance image prediction in photo albums

Viet Hoai Vo; Viet Quoc Le

doi:10.32508/stdj.v29i.4361

Engineering and Technology - Research article

Multitask learning based on attention and transformer mechanism for event recognition and importance image prediction in photo albums

Viet Hoai Vo ^{1, *}

Viet Quoc Le ¹

Computer Vision Department, University of Science, VNU-HCM, Ho Chi Minh City, Vietnam

Correspondence to: Viet Hoai Vo, Computer Vision Department, University of Science, VNU-HCM, Ho Chi Minh City, Vietnam. Email: vhviet@fit.hcmus.edu.vn.

Volume & Issue: Vol. 29 No. 1 (2026) | Page No.: 3911-3918 | DOI: 10.32508/stdj.v29i.4361

Updated: ##BioMedPressTheme.submission.updatedOn##

Abstract

Capturing images has become a simple task due to the popularity and technological advancements in cameras and cellphones. As the quantity of pictures being taken increases, the task of organiz- ing them while preserving their signiﬁcance is becoming more challenging. Solving this problem requires creating a system that can identify the type of album, select the important photos to store, and automatically delete the rest. Such a system could also signiﬁcantly reduce the storage require- ments and create attractive story videos. In this study, we design a multitask network architecture that can simultaneously learn event recognition and image importance, thereby preventing the need for event-type information. This approach combines the strengths of convolution neural net- works for image description with an attention and transformer mechanism for album description to perform both event recognition and image signiﬁcance determination, providing a viable and eﬀective approach with faster prediction times for both image importance and event identiﬁca- tion. Our approach surpasses state-of-the-art methods by improving 3% on image importance tasks and achieving 67.21% accuracy on event recognition tasks in the ML-CUFED dataset. The re- sults are evaluated on multiple backbones and parameters to demonstrate the generalization of the proposed methodology.

Keywords: event recognition image importance transformer multitask learning attention network photo album

VNUHCM Journal of

Science and Technology Development

Multitask learning based on attention and transformer mechanism for event recognition and importance image prediction in photo albums

Online metrics

Statistics from the website

Statistics from Dimensions

Statistics from PlumX

Abstract

Comments