
As more and more research shows, NLP systems are extremely vulnerable to out-of-domain distribution, “human-level” performance on test sets cannot be reproduced on real-world problems. Language is hard and the hard part is sparse in datasets, neural models have never show good result on small real-……