MULTITASK LEARNING BASED ON ATTENTION AND TRANSFORMER MECHANISM FOR EVENT RECOGNITION AND IMPORTANCE IMAGE PREDICTION IN PHOTO ALBUM
- Computer Vision Department, University of Science, VNU-HCM
Abstract
With today’s advanced technology, shooting images is really simple thanks to the popularity of cameras and cellphones. Therefore, as the quantity of photographs increases daily, it gets harder and harder to organize them while preserving the significance of the picture. Solving this problem is challenging and attractive in computer vision. How to create a system that can identify the type of album, pick out the crucial photos that must be kept, and automatically delete the rest. It is very significant in reducing the storage and creating fanatics story video. In this work, we design a multitask network architecture that can be simultaneously taught for event recognition and image importance, thereby preventing the need for event-type information. Fusing the strength of convolution neural networks for image description as well as attention and transformer mechanism for album description to conduct both event recognition and image significance, providing a workable and effective approach with faster prediction times for both image importance and event identification. Our approach reaches out the SOTA method and improves 3% on image importance tasks and achieves high accuracy 67.21% on event recognition tasks in the ML-CUFED dataset. The results are evaluated on multiple backbone and parameters to demonstrate the generalization of the proposed methodology.