VQA stands for the task of visual question answering. Given an image, our system can answer question about it.
For example, consider that we would like to ask a few question about this image:
Some questions that we can ask are -
- What is the colour of the person’s coat?
- What is the elephant doing?
- How many people are wearing green coats?
- What is the colour of the person’s hat?
to which the system would answer -
Interesting, isn’t it? But wait, there’s more!
We can also get an idea about the parts of the image that the system ‘looks’ at, to answer any question. The scientific term for this is attention.
As an example, for the question “What is the colour of the person’s coat?”, we can expect the system to look at the person and his clothes. Let’s see what the model is looking at:
Bingo! As we expected, the system is looking at the person and his coat to see what colour it is!
If I have your interest piqued, I suggest you to go to www.apoorvesinghal.com/vqa and play around with the demo yourself. It’s a lot of fun!
Code for the demo - https://github.com/apugoneappu/ask_me_anything
Kindly let me know if you liked (or disliked) the demo, it helps me improve :)