

Anthology ID: P18-1239 Volume: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Month: July Year: 2018 Address: Melbourne, Australia Venue: ACL SIG: Publisher: Association for Computational Linguistics Note: Pages: 2566–2576 Language: URL: DOI: 10.18653/v1/P18-1239 Bibkey: hewitt-etal-2018-learning Cite (ACL): John Hewitt, Daphne Ippolito, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya, and Chris Callison-Burch. Our code and the Massively Multilingual Image Dataset (MMID) are available at. This allows us to predict when image-based translation may be effective, enabling consistent improvements to a state-of-the-art text-based word translation system. To improve image-based translation, we introduce a novel method of predicting word concreteness from images, which improves on a previous state-of-the-art unsupervised technique. %We find that while image features work best for concrete nouns, they are sometimes effective on other parts of speech. We run experiments on a dozen high resource languages and 20 low resources languages, demonstrating the effect of word concreteness and part-of-speech on translation quality. In contrast, we have collected by far the largest available dataset for this task, with images for approximately 10,000 words in each of 100 languages.

Past datasets have been limited to only a few high-resource languages and unrealistically easy translation settings. To facilitate research on the task, we introduce a large-scale multilingual corpus of images, each labeled with the word it represents. Abstract We conduct the most comprehensive study to date into translating words via images.
