CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning