Papers

Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 106 papers
  • Transformer Feed-Forward Layers Are Key-Value Memories

    Mor Geva, R. Schuster, Jonathan Berant, Omer LevyEMNLP2021 Feed-forward layers constitute two-thirds of a transformer model’s parameters, yet their role in the network remains underexplored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates…
  • Value-aware Approximate Attention

    Ankit Gupta, Jonathan BerantEMNLP2021 Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. However, all approximations thus far have ignored the contribution of the…
  • What's in your Head? Emergent Behaviour in Multi-Task Transformer Models

    Mor Geva, Uri Katz, Aviv Ben-Arie, Jonathan BerantEMNLP2021 The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for…
  • Finding needles in a haystack: Sampling Structurally-diverse Training Sets from Synthetic Data for Compositional Generalization

    Inbar Oren, Jonathan Herzig, Jonathan BerantEMNLP2021 Modern semantic parsers suffer from two principal limitations. First, training requires expensive collection of utterance-program pairs. Second, semantic parsers fail to generalize at test time to new compositions/structures that have not been observed during…
  • COVR: A test-bed for Visually Grounded Compositional Generalization with real images

    Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan BerantEMNLP2021 While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually…
  • Question Decomposition with Dependency Graphs

    Matan Hasson, Jonathan BerantAKBC2021 QDMR is a meaning representation for complex questions, which decomposes questions into a sequence of atomic steps. While stateof-the-art QDMR parsers use the common sequence-to-sequence (seq2seq) approach, a QDMR structure fundamentally describes labeled…
  • Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

    Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan BerantTACL2021 A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce STRATEGYQA, a question answering (QA) benchmark where the required reasoning steps…
  • Few-Shot Question Answering by Pretraining Span Selection

    Ori Ram, Yuval Kirstain, Jonathan Berant, A. Globerson, Omer LevyACL2021 In a number of question answering (QA) benchmarks, pretrained models have reached human parity through fine-tuning on an order of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred training…
  • Neural Extractive Search

    Shaul Ravfogel, Hillel Taub-Tabib, Yoav GoldbergACL • Demo Track2021 Domain experts often need to extract structured information from large corpora. We advocate for a search paradigm called “extractive search”, in which a search query is enriched with capture-slots, to allow for such rapid extraction. Such an extractive search…
  • Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills

    Ori Yoran, Alon Talmor, Jonathan BerantarXiv2021 Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at…