Article Submission

Publisher

View Articles


Guidelines for Authors

Abstracting/Indexing

Order Journal
Volume 1, Issue 2, April 2021

Original Research


An Improved Image Captioning Using Emotions

Nabagata Saha, Y. V. Akhila and P. Radha Krishna

Received in final form on March 25, 2021

Abstract
Image captioning has been a challenging area for generating captions that closely resemble how humans would caption a particular image. The state of-the-art exists in factual captions to caption a given image that contains inanimate objects. However, captioning images with humans using facial expressions remains a field that has not been tinkered into. This paper proposes a novel method that realizes this task. The emotion recognized on the human subject present in the image is concatenated along with image features and fed to an image captioning model. The caption generated is more relevant and human-like. A deep learning model recognizes the emotion, and an encoder-decoder network captions the image. A multilevel VGG19 network is used for Facial Emotion Recognition to extract facial features, and Inception V3 (encoder) is used to extract the visual features. These features are fed to attention-tuned Gated Recurrent Unit (decoder) to produce the caption in a word-by-word manner. The presented approach provides a more realistic captioning of images, which can generate natural-sounding video summaries.


Keywords
Emotion discovery, Natural Language Processing, Computer Vision, Facial Expression Recognition, Image Captioning, Encoder Decoder Network, LSTM model, VGG Net model, Feature Concatenation.


Cite This Article
Nabagata Saha, Y. V. Akhila, and P. Radha Krishna, An Improved Image Captioning Using Emotions, J. Innovation Sciences and Sustainable Technologies, 1(2)(2021), 91-118. https://doie.org/10.0608/JISST.2022944590


    255    15    Download