Keynote: A Vision on (Simultaneous) Multimodal Machine Translation

less than 1 minute read

Published in NLP Beyond Text, https://sites.google.com/view/nlpbt-2020, 2020

Humans perceive the world and interact with it in multimodal ways. Language understanding and generation is not an exception. However, current natural language processing methods often solely rely on text to produce their hypotheses. In this talk, I will present recent works aiming to bring visual context to machine translation along with a qualitative assessment of the model capability to leverage this information. We show that while visual context helps, the model can be lazy.