Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 106 papers
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva, R. Schuster, Jonathan Berant, Omer LevyEMNLP • 2021 Feed-forward layers constitute two-thirds of a transformer model’s parameters, yet their role in the network remains underexplored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates…Value-aware Approximate Attention
Ankit Gupta, Jonathan BerantEMNLP • 2021 Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. However, all approximations thus far have ignored the contribution of the…What's in your Head? Emergent Behaviour in Multi-Task Transformer Models
Mor Geva, Uri Katz, Aviv Ben-Arie, Jonathan BerantEMNLP • 2021 The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for…Finding needles in a haystack: Sampling Structurally-diverse Training Sets from Synthetic Data for Compositional Generalization
Inbar Oren, Jonathan Herzig, Jonathan BerantEMNLP • 2021 Modern semantic parsers suffer from two principal limitations. First, training requires expensive collection of utterance-program pairs. Second, semantic parsers fail to generalize at test time to new compositions/structures that have not been observed during…COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan BerantEMNLP • 2021 While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually…Question Decomposition with Dependency Graphs
Matan Hasson, Jonathan BerantAKBC • 2021 QDMR is a meaning representation for complex questions, which decomposes questions into a sequence of atomic steps. While stateof-the-art QDMR parsers use the common sequence-to-sequence (seq2seq) approach, a QDMR structure fundamentally describes labeled…Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan BerantTACL • 2021 A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce STRATEGYQA, a question answering (QA) benchmark where the required reasoning steps…Few-Shot Question Answering by Pretraining Span Selection
Ori Ram, Yuval Kirstain, Jonathan Berant, A. Globerson, Omer LevyACL • 2021 In a number of question answering (QA) benchmarks, pretrained models have reached human parity through fine-tuning on an order of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred training…Neural Extractive Search
Shaul Ravfogel, Hillel Taub-Tabib, Yoav GoldbergACL • Demo Track • 2021 Domain experts often need to extract structured information from large corpora. We advocate for a search paradigm called “extractive search”, in which a search query is enriched with capture-slots, to allow for such rapid extraction. Such an extractive search…Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Ori Yoran, Alon Talmor, Jonathan BerantarXiv • 2021 Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at…