STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

The authors construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. The neural network trained using STAIR Captions manages to generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.


