Multitask learning based on attention and transformer mechanism for event recognition and importance image prediction in photo albums
- Computer Vision Department, University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Abstract
Capturing images has become a simple task due to the popularity and technological advancements in cameras and cellphones. As the quantity of pictures being taken increases, the task of organiz- ing them while preserving their significance is becoming more challenging. Solving this problem requires creating a system that can identify the type of album, select the important photos to store, and automatically delete the rest. Such a system could also significantly reduce the storage require- ments and create attractive story videos. In this study, we design a multitask network architecture that can simultaneously learn event recognition and image importance, thereby preventing the need for event-type information. This approach combines the strengths of convolution neural net- works for image description with an attention and transformer mechanism for album description to perform both event recognition and image significance determination, providing a viable and effective approach with faster prediction times for both image importance and event identifica- tion. Our approach surpasses state-of-the-art methods by improving 3% on image importance tasks and achieving 67.21% accuracy on event recognition tasks in the ML-CUFED dataset. The re- sults are evaluated on multiple backbones and parameters to demonstrate the generalization of the proposed methodology.