From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

shannon.cs.illinois.edu shannon.cs.illinois.edu