Show simple item record

G. A. Henty Corpus

 
dc.contributor.author Wynne, Martin
dc.contributor.author Henty, G. A. (George Alfred), 1832-1902
dc.date.accessioned 2024-12-11T09:41:21Z
dc.date.available 2024-12-11T09:41:21Z
dc.date.issued 2024-12-11
dc.identifier ota:4003
dc.identifier.uri http://hdl.handle.net/20.500.14106/4003
dc.description A corpus of the novels of G. A. Henty (1832-1902) in plain text format, made available for literary and linguistic research and for natural language processing. The texts were downloaded from Project Gutenberg, and then cleaned, with minimal metadata added. The corpus is made available in three formats
dc.language.iso eng
dc.publisher University of Oxford
dc.relation.ispartof Learning and teaching materials
dc.relation.ispartof Learning and teaching resources
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject Linguistic corpus
dc.subject Learning and teaching resources
dc.subject Learning and teaching materials
dc.subject.lcsh Novels -- Great Britain -- 19th century
dc.subject.lcsh Fiction -- Great Britain -- 19th century
dc.title G. A. Henty Corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding Literary and Linguistic Data Service
contact.person Martin Wynne martin.wynne@ling-phil.ox.ac.uk University of Oxford
size.info 10995088 tokens
size.info 104 files
files.size 65729627
files.count 5
otaterms.date.range 1800-1899

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

 Files for this item

 Download all local files for this item (62.68 MB)

Icon
Name
henty_titles.tsv
Size
5.65 KB
Format
text/tab-separated-values
Description
List of titles of the books in the corpus
 Download file
Icon
Name
henty_urls.txt
Size
4.33 KB
Format
Text file
Description
List of the URLs for the files downloaded from Project Gutenberg
 Download file  Preview
 File Preview  
https://www.gutenberg.org/ebooks/author/1032?sort_order=downloads
https://www.gutenberg.org/ebooks/author/1032?sort_order=release_date
https://www.gutenberg.org/ebooks/54091
https://www.gutenberg.org/ebooks/56255
https://www.gutenberg.org/ebooks/59529
https://www.gutenberg.org/ebooks/7346
https://www.gutenberg.org/ebooks/22224
https://www.gutenberg.org/ebooks/7060
https://www.gutenberg.org/ebooks/20729
https://www.gutenberg.org/ebooks/7037
https://www.gutenberg.org/ebooks/7006
https://www.gutenberg.org/ebooks/19070
https://www.gutenberg.org/ebooks/13354
https://www.gutenberg.org/ebooks/7318
https://www.gutenberg.org/ebooks/28357
https://www.gutenberg.org/ebooks/6953
https://www.gutenberg.org/ebooks/8679
https://www.gutenberg.org/ebooks/6952
https://www.gutenberg.org/ebooks/19398
https://www.gutenberg.org/ebooks/8576
https://www.gutenberg.org/ebooks/28857
https://www.gutenberg.org/ebooks/29756
https://www.gutenberg.org/ebooks/28190
https://www.gutenberg.org/ebooks/8155
https://www.guten . . .
										
Icon
Name
Henty_doctags.zip
Size
20.89 MB
Format
application/zip
Description
Henty Corpus files with simple doc tags with file identifiers and titles (for uploading to Sketch Engine)
 Download file  Preview
 File Preview  
  • ForSketchEngine
    • 8651.txt-1 B
    • 21614.txt-1 B
    • 22060.txt-1 B
    • 19070.txt-1 B
    • 36359.txt-1 B
    • 19714.txt-1 B
    • 7070.txt-1 B
    • 4932.txt-1 B
    • 21788.txt-1 B
    • 21986.txt-1 B
    • 43067.txt-1 B
    • 6953.txt-1 B
    • 55779.txt-1 B
    • 8859.txt-1 B
    • 7318.txt-1 B
    • 7037.txt-1 B
    • 28190.txt-1 B
    • 8155.txt-1 B
    • 45573.txt-1 B
    • 34886.txt-1 B
    • 7334.txt-1 B
    • 45617.txt-1 B
    • 20092.txt-1 B
    • 56143.txt-1 B
    • 39470.txt-1 B
    • 5075.txt-1 B
    • 33939.txt-1 B
    • 11565.txt-1 B
    • 3785.txt-1 B
    • 4931.txt-1 B
    • 6952.txt-1 B
    • 2805.txt-1 B
    • 18357.txt-1 B
    • 19398.txt-1 B
    • 7229.txt-1 B
    • 22224.txt-1 B
    • 8732.txt-1 B
    • 26090.txt-1 B
    • 30457.txt-1 B
    • 38764.txt-1 B
    • 20091.txt-1 B
    • 18868.txt-1 B
    • 11058.txt-1 B
    • 36236.txt-1 B
    • 19206.txt-1 B
    • 3674.txt-1 B
    • 4792.txt-1 B
    • 11609.txt-1 B
    • 35266.txt-1 B
    • 33619.txt-1 B
    • 9613.txt-1 B
    • 20031.txt-1 B
    • 20207.txt-1 B
    • 59529.txt-1 B
    • 21242.txt-1 B
    • 21979.txt-1 B
    • 6472.txt-1 B
    • 8670.txt-1 B
    • 8576.txt-1 B
    • 18356.txt-1 B
    • 20729.txt-1 B
    • 17546.txt-1 B
    • 48297.txt-1 B
    • 53717.txt-1 B
    • 18813.txt-1 B
    • 12308.txt-1 B
    • 24244.txt-1 B
    • 17766.txt-1 B
    • 30143.txt-1 B
    • 20641.txt-1 B
    • 17436.txt-1 B
    • 39374.txt-1 B
    • 49229.txt-1 B
    • 35265.txt-1 B
    • 42276.txt-1 B
    • 54091.txt-1 B
    • 13354.txt-1 B
    • 53859.txt-1 B
    • 14313.txt-1 B
    • 7870.txt-1 B
    • 17403.txt-1 B
    • 25993.txt-1 B
    • 7061.txt-1 B
    • 32934.txt-1 B
    • 36103.txt-1 B
    • 5128.txt-1 B
    • 56767.txt-1 B
    • 35012.txt-1 B
    • 31128.txt-1 B
    • 18349.txt-1 B
    • 39616.txt-1 B
    • 29756.txt-1 B
    • 28357.txt-1 B
    • 7006.txt-1 B
    • 7831.txt-1 B
    • 47008.txt-1 B
    • 18833.txt-1 B
    • 56255.txt-1 B
    • 7071.txt-1 B
    • 7060.txt-1 B
    • 28857.txt-1 B
    • 8745.txt-1 B
    • 36975.txt-1 B
    • 7346.txt-1 B
Icon
Name
Henty_xml.zip
Size
20.89 MB
Format
application/zip
Description
Henty corpus with XML headers with minimal metadata (but no structural markup)
 Download file  Preview
 File Preview  
  • xmlcorpus
    • 4931.xml-1 B
    • 6952.xml-1 B
    • 2805.xml-1 B
    • 18357.xml-1 B
    • 19398.xml-1 B
    • 7229.xml-1 B
    • 22224.xml-1 B
    • 8732.xml-1 B
    • 26090.xml-1 B
    • 30457.xml-1 B
    • 38764.xml-1 B
    • 20091.xml-1 B
    • 18868.xml-1 B
    • 36236.xml-1 B
    • 11058.xml-1 B
    • 19206.xml-1 B
    • 3674.xml-1 B
    • 11609.xml-1 B
    • 4792.xml-1 B
    • 35266.xml-1 B
    • 33619.xml-1 B
    • 9613.xml-1 B
    • 20031.xml-1 B
    • 20207.xml-1 B
    • 59529.xml-1 B
    • 21242.xml-1 B
    • 21979.xml-1 B
    • 6472.xml-1 B
    • 8670.xml-1 B
    • 8576.xml-1 B
    • 18356.xml-1 B
    • 20729.xml-1 B
    • 17546.xml-1 B
    • 48297.xml-1 B
    • 53717.xml-1 B
    • 12308.xml-1 B
    • 18813.xml-1 B
    • 24244.xml-1 B
    • 17766.xml-1 B
    • 30143.xml-1 B
    • 20641.xml-1 B
    • 39374.xml-1 B
    • 17436.xml-1 B
    • 35265.xml-1 B
    • 49229.xml-1 B
    • 42276.xml-1 B
    • 54091.xml-1 B
    • 13354.xml-1 B
    • 53859.xml-1 B
    • 14313.xml-1 B
    • 7870.xml-1 B
    • 17403.xml-1 B
    • 25993.xml-1 B
    • 7061.xml-1 B
    • 32934.xml-1 B
    • 36103.xml-1 B
    • 5128.xml-1 B
    • 56767.xml-1 B
    • 35012.xml-1 B
    • 31128.xml-1 B
    • 18349.xml-1 B
    • 39616.xml-1 B
    • 29756.xml-1 B
    • 28357.xml-1 B
    • 7006.xml-1 B
    • 7831.xml-1 B
    • 47008.xml-1 B
    • 18833.xml-1 B
    • 56255.xml-1 B
    • 7071.xml-1 B
    • 7060.xml-1 B
    • 28857.xml-1 B
    • 8745.xml-1 B
    • 36975.xml-1 B
    • 7346.xml-1 B
    • 8651.xml-1 B
    • 21614.xml-1 B
    • 22060.xml-1 B
    • 19070.xml-1 B
    • 36359.xml-1 B
    • 19714.xml-1 B
    • 7070.xml-1 B
    • 4932.xml-1 B
    • 21986.xml-1 B
    • 43067.xml-1 B
    • 21788.xml-1 B
    • 6953.xml-1 B
    • 55779.xml-1 B
    • 8859.xml-1 B
    • 7318.xml-1 B
    • 7037.xml-1 B
    • 28190.xml-1 B
    • 8155.xml-1 B
    • 45573.xml-1 B
    • 34886.xml-1 B
    • 7334.xml-1 B
    • 45617.xml-1 B
    • 20092.xml-1 B
    • 56143.xml-1 B
    • 39470.xml-1 B
    • 5075.xml-1 B
    • 33939.xml-1 B
    • 11565.xml-1 B
    • 3785.xml-1 B
Icon
Name
Henty_plaintext.zip
Size
20.89 MB
Format
application/zip
Description
Plain text version of the Heny corpus with no metadata or tagging
 Download file  Preview
 File Preview  
  • plaintextcorpus
    • 8651.txt-1 B
    • 21614.txt-1 B
    • 22060.txt-1 B
    • 19070.txt-1 B
    • 36359.txt-1 B
    • 19714.txt-1 B
    • 7070.txt-1 B
    • 4932.txt-1 B
    • 21788.txt-1 B
    • 21986.txt-1 B
    • 43067.txt-1 B
    • 6953.txt-1 B
    • 55779.txt-1 B
    • 8859.txt-1 B
    • 7318.txt-1 B
    • 7037.txt-1 B
    • 28190.txt-1 B
    • 8155.txt-1 B
    • 45573.txt-1 B
    • 34886.txt-1 B
    • 7334.txt-1 B
    • 45617.txt-1 B
    • 20092.txt-1 B
    • 56143.txt-1 B
    • 39470.txt-1 B
    • 5075.txt-1 B
    • 33939.txt-1 B
    • 11565.txt-1 B
    • 3785.txt-1 B
    • 4931.txt-1 B
    • 6952.txt-1 B
    • 2805.txt-1 B
    • 18357.txt-1 B
    • 19398.txt-1 B
    • 7229.txt-1 B
    • 22224.txt-1 B
    • 8732.txt-1 B
    • 26090.txt-1 B
    • 30457.txt-1 B
    • 38764.txt-1 B
    • 20091.txt-1 B
    • 18868.txt-1 B
    • 11058.txt-1 B
    • 36236.txt-1 B
    • 19206.txt-1 B
    • 3674.txt-1 B
    • 4792.txt-1 B
    • 11609.txt-1 B
    • 35266.txt-1 B
    • 33619.txt-1 B
    • 9613.txt-1 B
    • 20031.txt-1 B
    • 20207.txt-1 B
    • 59529.txt-1 B
    • 21242.txt-1 B
    • 21979.txt-1 B
    • 6472.txt-1 B
    • 8670.txt-1 B
    • 8576.txt-1 B
    • 18356.txt-1 B
    • 20729.txt-1 B
    • 17546.txt-1 B
    • 48297.txt-1 B
    • 53717.txt-1 B
    • 18813.txt-1 B
    • 12308.txt-1 B
    • 24244.txt-1 B
    • 17766.txt-1 B
    • 30143.txt-1 B
    • 20641.txt-1 B
    • 17436.txt-1 B
    • 39374.txt-1 B
    • 49229.txt-1 B
    • 35265.txt-1 B
    • 42276.txt-1 B
    • 54091.txt-1 B
    • 13354.txt-1 B
    • 53859.txt-1 B
    • 14313.txt-1 B
    • 7870.txt-1 B
    • 17403.txt-1 B
    • 25993.txt-1 B
    • 7061.txt-1 B
    • 32934.txt-1 B
    • 36103.txt-1 B
    • 5128.txt-1 B
    • 56767.txt-1 B
    • 35012.txt-1 B
    • 31128.txt-1 B
    • 18349.txt-1 B
    • 39616.txt-1 B
    • 29756.txt-1 B
    • 28357.txt-1 B
    • 7006.txt-1 B
    • 7831.txt-1 B
    • 47008.txt-1 B
    • 18833.txt-1 B
    • 56255.txt-1 B
    • 7071.txt-1 B
    • 7060.txt-1 B
    • 28857.txt-1 B
    • 8745.txt-1 B
    • 36975.txt-1 B
    • 7346.txt-1 B

Show simple item record