I had such a blast at ACL2020 this week! So many cool works, and lots of very interesting discussions both in the chat and in the zoom Q&A sessions!
Here’s a pick of 3 of my highlights (there are extremely biased towards what I’m currently interested in):
(1) Inherent Disagreements in Human Textual Inferences
by Ellie Pavlick, Tom Kwiatkowski
Natural Language Inference (sometimes referred to as textual entailment) has become fundamental in evaluating language understanding and semantics. The central question of this paper is “what should we use as ground truth labels for textual inference?” The authors show that the apparent “annotation noise” often results from a multi-modality among the annotators’ labels. They discuss the implication of this uncertainty and argue for a refined evaluation that better captures the diversity of human judgments.
(2) Unsupervised Domain Clusters in Pretrained Language Models
by Roee Aharoni, Yoav Goldberg
The authors propose a “data-driven” approach to define what a domain is in NLP and to select in-domain data. They show that large pre-trained language models are able to capture these domains in an unsupervised way and leverage this insight to select in-domain data to train neural machine translation models.
(3) Syntactic Data Augmentation Increases Robustness to Inference Heuristics
by Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen
Natural Language Inference models fine-tuned on top of models like BERT show high accuracy on standard test datasets but fail on challenge sets. The authors propose a simple syntactic data augmentation procedure to augment the standard training set to up to a thousand examples. Results show great improvement (and generalization) by just exposing the model to these controlled syntactic examples supporting that hypothesis that BERT contains knowledge that simply needs to be “activated”. Cases failures (like passive) support that the idea there is also knowledge pre-trained BERT is not aware of.
How about you? Did any work change your perspective?
Hi @VictorSanh, thanks so much for your list. As the conference is overwhelming with contents, I did not see these papers at all.
In paper (3) , syntactic augmentation is very interesting since
(a) Augmentation is very successful in Computer Vision (CV), but in NLP, augmentation is much more non-obvious (regarding how to do it) and maybe sensitive to downstream tasks (more robust in CV)
(b) In the paper Section 3, author stated that the augmented examples are noisy
We did not attempt to ensure the naturalness of
the generated examples; e.g., in the INVERSION
transformation, The carriage made a lot of noise
was transformed into A lot of noise made the carriage. In addition, the labels of the augmentation
dataset were somewhat noisy; e.g., we assumed
that INVERSION changed the correct label from entailment to neutral, but this is not necessarily the
case (if The buyer met the seller, it is likely that
The seller met the buyer). As we show below, this
noise did not hurt accuracy on MNLI.
This is very interesting to me (in CV it’s often intuitively clear which augmentation is noiseless / noisy), so I assume that the ‘noisy-ratio’ is minimum since too much noise should degrade the overall performance …
Further, in CV, we also have soft-labels augmentation like MixUp and CutMix, so maybe this similar area in NLP also has more potential.
(on Kaggle we also tried our own (non-published) augmentation to NLP with this similar ideas –
e.g. In the recent Jigsaw Toxic classification competition where a paragraph of comment texts are given as an example. We can combine two paragraphs together [with Toxic + Neutral = Toxic label Formula) , or dynamic random shuffling sentences within the given paragraph where toxicity degree should be invariant with this operation.)
That’s very interesting!
I agree, automatic data augmentation is still something somehow mysterious to me in NLP since it is way less controllable than in vision. It seems fine to me that the resulting examples are extremely noisy (I saw some works in vision where perturbed images where the original label becomes quite ambiguous). There might be a balance to find: you want the model to learn through the noise but also not to be over-confident when you have ambiguous examples…
Do you have guidelines you can share on the data augmentation in NLP? In which case it works? Why it works? Or a survey?
Hi Victor, I haven’t seen the guideline on NLP augmentation before.
Just want to note two potential augmentation codes.
As you may know, in vision, we have a lot of augmentation libraries, but one which really stands out is
albumentations due to its speed and variety. (De facto choices for all Kaggle competitors)
Recently, there’s a creative guy who applied the basic Albumentations class to NLP task of Jigsaw’s multi-lingual toxic classification (of course with HuggingFace model) : https://www.kaggle.com/shonenkov/nlp-albumentations
I believe we can extend this class in the future.
Another worth-mentioning is
nlpaug ( https://github.com/makcedward/nlpaug ) where we can augment with simpler ideas like synonym / antonym word swapping via word suggestions from NLTK and Bert
BTW, do your team also attend ICML this week?
Interesting! Thanks for the pointer, I’ll definitely check this out!
No, unfortunately, no one in the team is at ICML this week.