Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.