I had a great time at ACL this week. There are many great papers and I’m still going through them. Here’s a summary of just a few that I wanted to highlight. I’d love to get thoughts and retorts from anyone reading!
“To Test Machine Comprehension, Start by Defining Comprehension”
by Jesse Dunietz, Gregory Burnham, Akash Bharadwaj, Owen Rambow, Jennifer Chu-Carroll, and David Ferrucci
Like most great ideas, the framework presented here is simple – seemingly obvious, even. They take a specific look at Machine Reading Comprehension (MRC) and argue that current evaluation metrics don’t really inspire much confidence in the system’s comprehension of the relevant information in the passage to make it trust it in any real-world setting. They argue that rather than making questions harder, we should explicitly defining so-called “Templates of Understanding” to measure the different dimensions of comprehension within a particular context. For example, in the context of a story, they lay out the following ToU:
The authors do a great job thinking with clarity and simplicity about how we should approach evaluating MRC systems.
“Intermediate-Task Transfer Learning with Pretrained Language Models”
by Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut,
Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann,
Samuel R. Bowman
Recently the pre-train/fine-tune paradigm has become ubiquitous. This paper explores whether we can take advantage of labeled data during an intermediate training step. The authors do really extensive analysis on what kinds of datasets are useful for intermediate training and what downstream tasks they have a positive (or negative) effect on.
A really interesting insight for me is that commonsense tasks don’t ever seem to have a negative effect. They either help on the downstream task, or don’t have much of an effect at all. I wonder if this because we do have labeled commonsense data that is used, or if we could build some kind of unsupervised commonsense objective into the pre-training procedure that would work just as well.
“Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”
by Emily M. Bender and Alexander Koller
This paper is not focused around any one method or technique, but rather makes a general and pretty bold argument: meaning cannot be learned from form. In other words, just giving a model access to a whole bunch of text will never be enough to learn meaningfully about the real world.
Whether you buy their argument or not, I found it to be an intellectually stimulating presentation. I suspect the hyperintelligent octopus argument will be one that sticks around for a long time.
I also appreciated their word of caution about the way we use different words when communicating about a model’s capabilities. At the very end of the presentation, Alexander warned,
As a community, let’s remember that we’re scientists and not marketing people. Let’s be a little bit careful when we use terms like understanding, meaning, and comprehension.