OTA Core Collection

 

Datasets and texts collected from a variety of people and projects in the period since 1976. This collection excludes 'Legacy' and 'Text Creation Partnership' items in the Oxford Text Archive, and the contents of this collection are thought to be of reasonable quality and usefulness

Recent Submissions

  • corpus
    Oxford Text Archive Core Collection
    corpus
    Author(s):
    Description:
    A corpus of 450 novels that appeared between 1770 and 1930 in German, French and English in plain text format. It is designed for us in teaching and research.
     This item contains 2 files (101.01 MB).
     
    Publicly Available

  • lexicalConceptualResource
    Oxford Text Archive Core Collection
    lexicalConceptualResource
    Date of publication:
    2024
    Description:
    Open English WordNet is a lexical network of the English language grouping words into synsets and linking them according to relationships such as hypernymy, antonymy and meronymy. It is intended to be used in natural ...
     This item contains 3 files (310.92 MB).
     
    Publicly Available

  • Linguistic corpora
    Oxford Text Archive Core Collection
    Linguistic corpora
    Author(s):
    Description:
    A corpus of literary texts based on Harold Bloom’s The Western Canon: The Books and School of the Ages (1994), created in order to conduct exploratory research in in Culturomics and Corpus Stylistics. There are 805 texts ...
     This item contains 2 files (158.99 MB).
     
    Publicly Available

  • corpus
    Oxford Text Archive Core Collection
    corpus
    Description:
    The Corpus of Late Modern English Texts (CLMET) is a corpus of roughly 35 million words of British English from 1710–1920, grouped into three 70-year periods. The history, versions and specifics of corpus composition can ...
     This item contains 5 files (689.81 MB).
     
    Publicly Available

  • corpus
    Oxford Text Archive Core Collection
    corpus
    Date of publication:
    1881-1922
    Author(s):
    Description:
    The Corpus of English Novels (CEN), compiled by Hendrik De Smet, has been designed to allow tracking of short-term language change and comparing usage across individual authors. It consists entirely of novels, written by ...
     This item contains 2 files (54.16 MB).
     
    Publicly Available

View more