ACL 2020 - Some personal highlights - Victor

Hi @VictorSanh, thanks so much for your list. As the conference is overwhelming with contents, I did not see these papers at all.

In paper (3) , syntactic augmentation is very interesting since
(a) Augmentation is very successful in Computer Vision (CV), but in NLP, augmentation is much more non-obvious (regarding how to do it) and maybe sensitive to downstream tasks (more robust in CV)
(b) In the paper Section 3, author stated that the augmented examples are noisy

We did not attempt to ensure the naturalness of
the generated examples; e.g., in the INVERSION
transformation, The carriage made a lot of noise
was transformed into A lot of noise made the carriage. In addition, the labels of the augmentation
dataset were somewhat noisy; e.g., we assumed
that INVERSION changed the correct label from entailment to neutral, but this is not necessarily the
case (if The buyer met the seller, it is likely that
The seller met the buyer). As we show below, this
noise did not hurt accuracy on MNLI.

This is very interesting to me (in CV it’s often intuitively clear which augmentation is noiseless / noisy), so I assume that the ‘noisy-ratio’ is minimum since too much noise should degrade the overall performance …

Further, in CV, we also have soft-labels augmentation like MixUp and CutMix, so maybe this similar area in NLP also has more potential.

(on Kaggle we also tried our own (non-published) augmentation to NLP with this similar ideas –

e.g. In the recent Jigsaw Toxic classification competition where a paragraph of comment texts are given as an example. We can combine two paragraphs together [with Toxic + Neutral = Toxic label Formula) , or dynamic random shuffling sentences within the given paragraph where toxicity degree should be invariant with this operation.)