A Dataset of Incomplete Information Reading Comprehension Questions

AllenNLP • 2020
IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes with a set of links to other Wikipedia pages, and answering the questions requires finding the appropriate links to follow and retrieving relevant information from those linked pages that is missing from the original context.
License: CC BY

Linked Articles

Clicking Download will provide a link to download the training and development sets of the latest version of the dataset. Each context paragraph contains links to other Wikipedia articles. You can download the those linked articles as a single JSON file here.

Test set

Once you are ready to evaluate your finalized model on the test set, click here to download the test split from the latest version of the data.

Authors

James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot and Pradeep Dasigi