Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Visual question answering (VQA) is one of these tasks that seem almost miraculous when models perform well at it. Past work on VQA, however, has shown that even simple models (Zhou et al., 2015; Jabri et al., ECCV 2016) perform surprisingly well and even outperform more complex models. Following this trend, as no single model is currently able to ground concepts and answer questions about images, the current state-of-the-art is a thoughtful collection of tips and tricks.


Want to receive more content like this in your inbox?