Automated Image Description Generation System
Jump to navigation
Jump to search
An Automated Image Description Generation System is a Text Generation System that converts an image captions into a text item.
- AKA: Image Caption Generation System.
- Context:
- It can solve an Automated Image Description Generation Task by implementing an Automated Image Description Generation Algorithm.
- …
- Example(s):
- Counter-Example(s):
- See: Natural Language Processing System, Natural Language Generation System, Natural Language Understanding System, Natural Language Inference System.
References
2020a
- (Wang et al., 2020) ⇒ Haoran Wang, Yue Zhang, and Xiaosheng Yu (2020). "An Overview of Image Caption Generation Methods". In: Computational Intelligence and Neuroscience, 2020. DOI:10.1155/2020/3062706.
- QUOTE: In recent years, with the rapid development of artificial intelligence, image caption has gradually attracted the attention of many researchers in the field of artificial intelligence and has become an interesting and arduous task. Image caption, automatically generating natural language descriptions according to the content observed in an image, is an important part of scene understanding, which combines the knowledge of computer vision and natural language processing. The application of image caption is extensive and significant, for example, the realization of human-computer interaction.
2020b
- (TensorFlow, 2020) ⇒ https://www.tensorflow.org/tutorials/text/image_captioning Retrieved: 2020-09-18.
- QUOTE: To accomplish this, you'll use an attention-based model, which enables us to see what parts of the image the model focuses on as it generates a caption.
The model architecture is similar to Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
This notebook is an end-to-end example. When you run the notebook, it downloads the MS-COCO dataset, preprocesses and caches a subset of images using Inception V3, trains an encoder-decoder model, and generates captions on new images using the trained model.
- QUOTE: To accomplish this, you'll use an attention-based model, which enables us to see what parts of the image the model focuses on as it generates a caption.
2018
- (Batra et al., 2018) ⇒ Vishwash Batra, Yulan He, and George Vogiatzis (2018). "Neural Caption Generation for News Images". In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). ISBN : 979-10-95546-00-9.
- QUOTE: News image caption generation, however, is different from the typical image captioning task. The input to news image caption generation is both a news article and its accompanying image, as opposed to the traditional image captioning task where the input is only an image.
2016
- (Yang et al., 2016) ⇒ Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, and William W. Cohen (2016). "Review Networks for Caption Generation". In: Proceedings of 29th Advances in Neural Information Processing Systems (NIPS 2016).
- QUOTE: We present a novel architecture, the review network, to improve the encoder-decoder learning framework. The review network performs multiple review steps with attention on the encoder hidden states, and computes a set of thought vectors that summarize the global information in the input. We empirically show consistent improvement over conventional encoder-decoders on the tasks of image captioning and source code captioning.
2015a
- (Xu et al., 2015) ⇒ Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. (2015). “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.” In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Volume 37.
- QUOTE: Automatically generating captions for an image is a task close to the heart of scene understanding — one of the primary goals of computer vision. Not only must caption generation models be able to solve the computer vision challenges of determining what objects are in an image, but they must also be powerful enough to capture and express their relationships in natural language. For this reason, caption generation has long been seen as a difficult problem. It amounts to mimicking the remarkable human ability to compress huge amounts of salient visual information into descriptive language and is thus an important challenge for machine learning and AI research.
2015b
- (Chen & Zitnick, 2015) ⇒ Xinlei Chen, and C. Lawrence Zitnick (2015). "Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation". In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015).
- QUOTE: Image captions describe both the objects in the image and their relationships. An area of future work is to examine the sequential exploration of an image and how it relates to image descriptions. Many words correspond to spatial relations that our current model has difficultly in detecting. As demonstrated by the recent paper of Karpathy et al. (2014) better feature localization in the image can greatly improve the performance of retrieval tasks and similar improvement might be seen in the description generation task (...)
2014
- (Karpathy et al., 2014) ⇒ Andrej Karpathy, Armand Joulin, and Li F. Fei-Fei(2014). "Deep Fragment Embeddings for Bidirectional Image Sentence Mapping". In: Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014).
- QUOTE: There is significant value in the ability to associate natural language descriptions with images. Describing the contents of images is useful for automated image captioning and conversely, the ability to retrieve images based on natural language queries has immediate image search applications. In particular, in this work we are interested in training a model on a set of images and their associated natural language descriptions such that we can later rank a fixed set of withheld sentences given an image query, and vice versa.