G. A. Henty Corpus
dc.contributor.author | Wynne, Martin |
dc.contributor.author | Henty, G. A. (George Alfred), 1832-1902 |
dc.date.accessioned | 2024-12-11T09:41:21Z |
dc.date.available | 2024-12-11T09:41:21Z |
dc.date.issued | 2024-12-11 |
dc.identifier | ota:4003 |
dc.identifier.uri | http://hdl.handle.net/20.500.14106/4003 |
dc.description | A corpus of the novels of G. A. Henty (1832-1902) in plain text format, made available for literary and linguistic research and for natural language processing. The texts were downloaded from Project Gutenberg, and then cleaned, with minimal metadata added. The corpus is made available in three formats |
dc.language.iso | eng |
dc.publisher | University of Oxford |
dc.relation.ispartof | Learning and teaching materials |
dc.relation.ispartof | Learning and teaching resources |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.subject | Linguistic corpus |
dc.subject | Learning and teaching resources |
dc.subject | Learning and teaching materials |
dc.subject.lcsh | Novels -- Great Britain -- 19th century |
dc.subject.lcsh | Fiction -- Great Britain -- 19th century |
dc.title | G. A. Henty Corpus |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
hidden | false |
hasMetadata | false |
has.files | yes |
branding | Literary and Linguistic Data Service |
contact.person | Martin Wynne martin.wynne@ling-phil.ox.ac.uk University of Oxford |
size.info | 10995088 tokens |
size.info | 104 files |
files.size | 65729627 |
files.count | 5 |
otaterms.date.range | 1800-1899 |
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
Files for this item
Download all local files for this item (62.68 MB)
- Name
- henty_titles.tsv
- Size
- 5.65 KB
- Format
- text/tab-separated-values
- Description
- List of titles of the books in the corpus
- Name
- henty_urls.txt
- Size
- 4.33 KB
- Format
- Text file
- Description
- List of the URLs for the files downloaded from Project Gutenberg
https://www.gutenberg.org/ebooks/author/1032?sort_order=downloads https://www.gutenberg.org/ebooks/author/1032?sort_order=release_date https://www.gutenberg.org/ebooks/54091 https://www.gutenberg.org/ebooks/56255 https://www.gutenberg.org/ebooks/59529 https://www.gutenberg.org/ebooks/7346 https://www.gutenberg.org/ebooks/22224 https://www.gutenberg.org/ebooks/7060 https://www.gutenberg.org/ebooks/20729 https://www.gutenberg.org/ebooks/7037 https://www.gutenberg.org/ebooks/7006 https://www.gutenberg.org/ebooks/19070 https://www.gutenberg.org/ebooks/13354 https://www.gutenberg.org/ebooks/7318 https://www.gutenberg.org/ebooks/28357 https://www.gutenberg.org/ebooks/6953 https://www.gutenberg.org/ebooks/8679 https://www.gutenberg.org/ebooks/6952 https://www.gutenberg.org/ebooks/19398 https://www.gutenberg.org/ebooks/8576 https://www.gutenberg.org/ebooks/28857 https://www.gutenberg.org/ebooks/29756 https://www.gutenberg.org/ebooks/28190 https://www.gutenberg.org/ebooks/8155 https://www.guten . . .
- Name
- Henty_doctags.zip
- Size
- 20.89 MB
- Format
- application/zip
- Description
- Henty Corpus files with simple doc tags with file identifiers and titles (for uploading to Sketch Engine)
- ForSketchEngine
- 8651.txt-1 B
- 21614.txt-1 B
- 22060.txt-1 B
- 19070.txt-1 B
- 36359.txt-1 B
- 19714.txt-1 B
- 7070.txt-1 B
- 4932.txt-1 B
- 21788.txt-1 B
- 21986.txt-1 B
- 43067.txt-1 B
- 6953.txt-1 B
- 55779.txt-1 B
- 8859.txt-1 B
- 7318.txt-1 B
- 7037.txt-1 B
- 28190.txt-1 B
- 8155.txt-1 B
- 45573.txt-1 B
- 34886.txt-1 B
- 7334.txt-1 B
- 45617.txt-1 B
- 20092.txt-1 B
- 56143.txt-1 B
- 39470.txt-1 B
- 5075.txt-1 B
- 33939.txt-1 B
- 11565.txt-1 B
- 3785.txt-1 B
- 4931.txt-1 B
- 6952.txt-1 B
- 2805.txt-1 B
- 18357.txt-1 B
- 19398.txt-1 B
- 7229.txt-1 B
- 22224.txt-1 B
- 8732.txt-1 B
- 26090.txt-1 B
- 30457.txt-1 B
- 38764.txt-1 B
- 20091.txt-1 B
- 18868.txt-1 B
- 11058.txt-1 B
- 36236.txt-1 B
- 19206.txt-1 B
- 3674.txt-1 B
- 4792.txt-1 B
- 11609.txt-1 B
- 35266.txt-1 B
- 33619.txt-1 B
- 9613.txt-1 B
- 20031.txt-1 B
- 20207.txt-1 B
- 59529.txt-1 B
- 21242.txt-1 B
- 21979.txt-1 B
- 6472.txt-1 B
- 8670.txt-1 B
- 8576.txt-1 B
- 18356.txt-1 B
- 20729.txt-1 B
- 17546.txt-1 B
- 48297.txt-1 B
- 53717.txt-1 B
- 18813.txt-1 B
- 12308.txt-1 B
- 24244.txt-1 B
- 17766.txt-1 B
- 30143.txt-1 B
- 20641.txt-1 B
- 17436.txt-1 B
- 39374.txt-1 B
- 49229.txt-1 B
- 35265.txt-1 B
- 42276.txt-1 B
- 54091.txt-1 B
- 13354.txt-1 B
- 53859.txt-1 B
- 14313.txt-1 B
- 7870.txt-1 B
- 17403.txt-1 B
- 25993.txt-1 B
- 7061.txt-1 B
- 32934.txt-1 B
- 36103.txt-1 B
- 5128.txt-1 B
- 56767.txt-1 B
- 35012.txt-1 B
- 31128.txt-1 B
- 18349.txt-1 B
- 39616.txt-1 B
- 29756.txt-1 B
- 28357.txt-1 B
- 7006.txt-1 B
- 7831.txt-1 B
- 47008.txt-1 B
- 18833.txt-1 B
- 56255.txt-1 B
- 7071.txt-1 B
- 7060.txt-1 B
- 28857.txt-1 B
- 8745.txt-1 B
- 36975.txt-1 B
- 7346.txt-1 B
- Name
- Henty_xml.zip
- Size
- 20.89 MB
- Format
- application/zip
- Description
- Henty corpus with XML headers with minimal metadata (but no structural markup)
- xmlcorpus
- 4931.xml-1 B
- 6952.xml-1 B
- 2805.xml-1 B
- 18357.xml-1 B
- 19398.xml-1 B
- 7229.xml-1 B
- 22224.xml-1 B
- 8732.xml-1 B
- 26090.xml-1 B
- 30457.xml-1 B
- 38764.xml-1 B
- 20091.xml-1 B
- 18868.xml-1 B
- 36236.xml-1 B
- 11058.xml-1 B
- 19206.xml-1 B
- 3674.xml-1 B
- 11609.xml-1 B
- 4792.xml-1 B
- 35266.xml-1 B
- 33619.xml-1 B
- 9613.xml-1 B
- 20031.xml-1 B
- 20207.xml-1 B
- 59529.xml-1 B
- 21242.xml-1 B
- 21979.xml-1 B
- 6472.xml-1 B
- 8670.xml-1 B
- 8576.xml-1 B
- 18356.xml-1 B
- 20729.xml-1 B
- 17546.xml-1 B
- 48297.xml-1 B
- 53717.xml-1 B
- 12308.xml-1 B
- 18813.xml-1 B
- 24244.xml-1 B
- 17766.xml-1 B
- 30143.xml-1 B
- 20641.xml-1 B
- 39374.xml-1 B
- 17436.xml-1 B
- 35265.xml-1 B
- 49229.xml-1 B
- 42276.xml-1 B
- 54091.xml-1 B
- 13354.xml-1 B
- 53859.xml-1 B
- 14313.xml-1 B
- 7870.xml-1 B
- 17403.xml-1 B
- 25993.xml-1 B
- 7061.xml-1 B
- 32934.xml-1 B
- 36103.xml-1 B
- 5128.xml-1 B
- 56767.xml-1 B
- 35012.xml-1 B
- 31128.xml-1 B
- 18349.xml-1 B
- 39616.xml-1 B
- 29756.xml-1 B
- 28357.xml-1 B
- 7006.xml-1 B
- 7831.xml-1 B
- 47008.xml-1 B
- 18833.xml-1 B
- 56255.xml-1 B
- 7071.xml-1 B
- 7060.xml-1 B
- 28857.xml-1 B
- 8745.xml-1 B
- 36975.xml-1 B
- 7346.xml-1 B
- 8651.xml-1 B
- 21614.xml-1 B
- 22060.xml-1 B
- 19070.xml-1 B
- 36359.xml-1 B
- 19714.xml-1 B
- 7070.xml-1 B
- 4932.xml-1 B
- 21986.xml-1 B
- 43067.xml-1 B
- 21788.xml-1 B
- 6953.xml-1 B
- 55779.xml-1 B
- 8859.xml-1 B
- 7318.xml-1 B
- 7037.xml-1 B
- 28190.xml-1 B
- 8155.xml-1 B
- 45573.xml-1 B
- 34886.xml-1 B
- 7334.xml-1 B
- 45617.xml-1 B
- 20092.xml-1 B
- 56143.xml-1 B
- 39470.xml-1 B
- 5075.xml-1 B
- 33939.xml-1 B
- 11565.xml-1 B
- 3785.xml-1 B
- Name
- Henty_plaintext.zip
- Size
- 20.89 MB
- Format
- application/zip
- Description
- Plain text version of the Heny corpus with no metadata or tagging
- plaintextcorpus
- 8651.txt-1 B
- 21614.txt-1 B
- 22060.txt-1 B
- 19070.txt-1 B
- 36359.txt-1 B
- 19714.txt-1 B
- 7070.txt-1 B
- 4932.txt-1 B
- 21788.txt-1 B
- 21986.txt-1 B
- 43067.txt-1 B
- 6953.txt-1 B
- 55779.txt-1 B
- 8859.txt-1 B
- 7318.txt-1 B
- 7037.txt-1 B
- 28190.txt-1 B
- 8155.txt-1 B
- 45573.txt-1 B
- 34886.txt-1 B
- 7334.txt-1 B
- 45617.txt-1 B
- 20092.txt-1 B
- 56143.txt-1 B
- 39470.txt-1 B
- 5075.txt-1 B
- 33939.txt-1 B
- 11565.txt-1 B
- 3785.txt-1 B
- 4931.txt-1 B
- 6952.txt-1 B
- 2805.txt-1 B
- 18357.txt-1 B
- 19398.txt-1 B
- 7229.txt-1 B
- 22224.txt-1 B
- 8732.txt-1 B
- 26090.txt-1 B
- 30457.txt-1 B
- 38764.txt-1 B
- 20091.txt-1 B
- 18868.txt-1 B
- 11058.txt-1 B
- 36236.txt-1 B
- 19206.txt-1 B
- 3674.txt-1 B
- 4792.txt-1 B
- 11609.txt-1 B
- 35266.txt-1 B
- 33619.txt-1 B
- 9613.txt-1 B
- 20031.txt-1 B
- 20207.txt-1 B
- 59529.txt-1 B
- 21242.txt-1 B
- 21979.txt-1 B
- 6472.txt-1 B
- 8670.txt-1 B
- 8576.txt-1 B
- 18356.txt-1 B
- 20729.txt-1 B
- 17546.txt-1 B
- 48297.txt-1 B
- 53717.txt-1 B
- 18813.txt-1 B
- 12308.txt-1 B
- 24244.txt-1 B
- 17766.txt-1 B
- 30143.txt-1 B
- 20641.txt-1 B
- 17436.txt-1 B
- 39374.txt-1 B
- 49229.txt-1 B
- 35265.txt-1 B
- 42276.txt-1 B
- 54091.txt-1 B
- 13354.txt-1 B
- 53859.txt-1 B
- 14313.txt-1 B
- 7870.txt-1 B
- 17403.txt-1 B
- 25993.txt-1 B
- 7061.txt-1 B
- 32934.txt-1 B
- 36103.txt-1 B
- 5128.txt-1 B
- 56767.txt-1 B
- 35012.txt-1 B
- 31128.txt-1 B
- 18349.txt-1 B
- 39616.txt-1 B
- 29756.txt-1 B
- 28357.txt-1 B
- 7006.txt-1 B
- 7831.txt-1 B
- 47008.txt-1 B
- 18833.txt-1 B
- 56255.txt-1 B
- 7071.txt-1 B
- 7060.txt-1 B
- 28857.txt-1 B
- 8745.txt-1 B
- 36975.txt-1 B
- 7346.txt-1 B