Papers

Learn more about AI2's Lasting Impact Award
Viewing 221-230 of 298 papers
  • QuASE: Question-Answer Driven Sentence Encoding.

    Hangfeng He, Qiang Ning, Dan RothACL2020 Question-answering (QA) data often encodes essential information in many facets. This paper studies a natural question: Can we get supervision from QA data for other tasks (typically, non-QA ones)? For example, {\em can we use QAMR (Michael et al., 2017) to…
  • Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models

    Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, James W. Pennebaker ACL2020 We investigate the use of NLP as a measure of the cognitive processes involved in storytelling, contrasting imagination and recollection of events. To facilitate this, we collect and release HIPPOCORPUS, a dataset of 7,000 stories about imagined and recalled…
  • Social Bias Frames: Reasoning about Social and Power Implications of Language

    Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, Yejin ChoiACL2020
    WeCNLP Best Paper
    Language has the power to reinforce stereotypes and project social biases onto others. At the core of the challenge is that it is rarely what is stated explicitly, but all the implied meanings that frame people's judgements about others. For example, given a…
  • The Right Tool for the Job: Matching Model and Instance Complexities

    Roy Schwartz, Gabi Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. SmithACL2020 As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. To better respect a given inference budget, we propose a modification to contextual representation fine-tuning…
  • Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering

    Ben Bogin, Sanjay Subramanian, Matt Gardner, Jonathan BerantTACL2020 Answering questions that involve multi-step reasoning requires decomposing them and using the answers of intermediate steps to reach the final answer. However, state-ofthe-art models in grounded question answering often do not explicitly perform decomposition…
  • Contextual Word Representations: Putting Words into Computers

    Noah A. SmithCACM2020 This article aims to tell the story of how we put words into computers. It is part of the story of the field of natural language processing (NLP), a branch of artificial intelligence.a It targets a wide audience with a basic understanding of computer…
  • On Consequentialism and Fairness

    Dallas Card, Noah A. SmithFrontiers in AI Journal2020 Recent work on fairness in machine learning has primarily emphasized how to define, quantify, and encourage "fair" outcomes. Less attention has been paid, however, to the ethical foundations which underlie such efforts. Among the ethical perspectives that…
  • Explain like I am a Scientist: The Linguistic Barriers of Entry to r/science

    Tal August, Dallas Card, Gary Hsieh, Noah A. Smith, Katharina ReineckeCHI2020 As an online community for discussing research findings, r/science has the potential to contribute to science outreach and communication with a broad audience. Yet previous work suggests that most of the active contributors on r/science are science-educated…
  • Longformer: The Long-Document Transformer

    Iz Beltagy, Matthew E. Peters, Arman CohanarXiv2020 Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly…
  • Evaluating NLP Models via Contrast Sets

    M.Gardner, Y.Artzi, V.Basmova, J.Berant, B.Bogin, S.Chen, P.Dasigi, D.Dua, Y.Elazar, A.Gottumukkala, N.Gupta, H.Hajishirzi, G.Ilharco, D.Khashabi, K.Lin, J.Liu, N.Liu, P.Mulcaire, Q.Ning, S.Singh, N.Smith, S.Subramanian, R.Tsarfaty, E.Wallace, et.alarXiv2020 Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on…