This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
Files for this item
Download all local files for this item (62.68 MB)
- Name
- henty_titles.tsv
- Size
- 5.65 KB
- Format
- text/tab-separated-values
- Description
- List of titles of the books in the corpus
- Name
- henty_urls.txt
- Size
- 4.33 KB
- Format
- Text file
- Description
- List of the URLs for the files downloaded from Project Gutenberg
https://www.gutenberg.org/ebooks/author/1032?sort_order=downloads https://www.gutenberg.org/ebooks/author/1032?sort_order=release_date https://www.gutenberg.org/ebooks/54091 https://www.gutenberg.org/ebooks/56255 https://www.gutenberg.org/ebooks/59529 https://www.gutenberg.org/ebooks/7346 https://www.gutenberg.org/ebooks/22224 https://www.gutenberg.org/ebooks/7060 https://www.gutenberg.org/ebooks/20729 https://www.gutenberg.org/ebooks/7037 https://www.gutenberg.org/ebooks/7006 https://www.gutenberg.org/ebooks/19070 https://www.gutenberg.org/ebooks/13354 https://www.gutenberg.org/ebooks/7318 https://www.gutenberg.org/ebooks/28357 https://www.gutenberg.org/ebooks/6953 https://www.gutenberg.org/ebooks/8679 https://www.gutenberg.org/ebooks/6952 https://www.gutenberg.org/ebooks/19398 https://www.gutenberg.org/ebooks/8576 https://www.gutenberg.org/ebooks/28857 https://www.gutenberg.org/ebooks/29756 https://www.gutenberg.org/ebooks/28190 https://www.gutenberg.org/ebooks/8155 https://www.guten . . .
- Name
- Henty_doctags.zip
- Size
- 20.89 MB
- Format
- application/zip
- Description
- Henty Corpus files with simple doc tags with file identifiers and titles (for uploading to Sketch Engine)
- ForSketchEngine
- 8651.txt-1 B
- 21614.txt-1 B
- 22060.txt-1 B
- 19070.txt-1 B
- 36359.txt-1 B
- 19714.txt-1 B
- 7070.txt-1 B
- 4932.txt-1 B
- 21788.txt-1 B
- 21986.txt-1 B
- 43067.txt-1 B
- 6953.txt-1 B
- 55779.txt-1 B
- 8859.txt-1 B
- 7318.txt-1 B
- 7037.txt-1 B
- 28190.txt-1 B
- 8155.txt-1 B
- 45573.txt-1 B
- 34886.txt-1 B
- 7334.txt-1 B
- 45617.txt-1 B
- 20092.txt-1 B
- 56143.txt-1 B
- 39470.txt-1 B
- 5075.txt-1 B
- 33939.txt-1 B
- 11565.txt-1 B
- 3785.txt-1 B
- 4931.txt-1 B
- 6952.txt-1 B
- 2805.txt-1 B
- 18357.txt-1 B
- 19398.txt-1 B
- 7229.txt-1 B
- 22224.txt-1 B
- 8732.txt-1 B
- 26090.txt-1 B
- 30457.txt-1 B
- 38764.txt-1 B
- 20091.txt-1 B
- 18868.txt-1 B
- 11058.txt-1 B
- 36236.txt-1 B
- 19206.txt-1 B
- 3674.txt-1 B
- 4792.txt-1 B
- 11609.txt-1 B
- 35266.txt-1 B
- 33619.txt-1 B
- 9613.txt-1 B
- 20031.txt-1 B
- 20207.txt-1 B
- 59529.txt-1 B
- 21242.txt-1 B
- 21979.txt-1 B
- 6472.txt-1 B
- 8670.txt-1 B
- 8576.txt-1 B
- 18356.txt-1 B
- 20729.txt-1 B
- 17546.txt-1 B
- 48297.txt-1 B
- 53717.txt-1 B
- 18813.txt-1 B
- 12308.txt-1 B
- 24244.txt-1 B
- 17766.txt-1 B
- 30143.txt-1 B
- 20641.txt-1 B
- 17436.txt-1 B
- 39374.txt-1 B
- 49229.txt-1 B
- 35265.txt-1 B
- 42276.txt-1 B
- 54091.txt-1 B
- 13354.txt-1 B
- 53859.txt-1 B
- 14313.txt-1 B
- 7870.txt-1 B
- 17403.txt-1 B
- 25993.txt-1 B
- 7061.txt-1 B
- 32934.txt-1 B
- 36103.txt-1 B
- 5128.txt-1 B
- 56767.txt-1 B
- 35012.txt-1 B
- 31128.txt-1 B
- 18349.txt-1 B
- 39616.txt-1 B
- 29756.txt-1 B
- 28357.txt-1 B
- 7006.txt-1 B
- 7831.txt-1 B
- 47008.txt-1 B
- 18833.txt-1 B
- 56255.txt-1 B
- 7071.txt-1 B
- 7060.txt-1 B
- 28857.txt-1 B
- 8745.txt-1 B
- 36975.txt-1 B
- 7346.txt-1 B
- Name
- Henty_xml.zip
- Size
- 20.89 MB
- Format
- application/zip
- Description
- Henty corpus with XML headers with minimal metadata (but no structural markup)
- xmlcorpus
- 4931.xml-1 B
- 6952.xml-1 B
- 2805.xml-1 B
- 18357.xml-1 B
- 19398.xml-1 B
- 7229.xml-1 B
- 22224.xml-1 B
- 8732.xml-1 B
- 26090.xml-1 B
- 30457.xml-1 B
- 38764.xml-1 B
- 20091.xml-1 B
- 18868.xml-1 B
- 36236.xml-1 B
- 11058.xml-1 B
- 19206.xml-1 B
- 3674.xml-1 B
- 11609.xml-1 B
- 4792.xml-1 B
- 35266.xml-1 B
- 33619.xml-1 B
- 9613.xml-1 B
- 20031.xml-1 B
- 20207.xml-1 B
- 59529.xml-1 B
- 21242.xml-1 B
- 21979.xml-1 B
- 6472.xml-1 B
- 8670.xml-1 B
- 8576.xml-1 B
- 18356.xml-1 B
- 20729.xml-1 B
- 17546.xml-1 B
- 48297.xml-1 B
- 53717.xml-1 B
- 12308.xml-1 B
- 18813.xml-1 B
- 24244.xml-1 B
- 17766.xml-1 B
- 30143.xml-1 B
- 20641.xml-1 B
- 39374.xml-1 B
- 17436.xml-1 B
- 35265.xml-1 B
- 49229.xml-1 B
- 42276.xml-1 B
- 54091.xml-1 B
- 13354.xml-1 B
- 53859.xml-1 B
- 14313.xml-1 B
- 7870.xml-1 B
- 17403.xml-1 B
- 25993.xml-1 B
- 7061.xml-1 B
- 32934.xml-1 B
- 36103.xml-1 B
- 5128.xml-1 B
- 56767.xml-1 B
- 35012.xml-1 B
- 31128.xml-1 B
- 18349.xml-1 B
- 39616.xml-1 B
- 29756.xml-1 B
- 28357.xml-1 B
- 7006.xml-1 B
- 7831.xml-1 B
- 47008.xml-1 B
- 18833.xml-1 B
- 56255.xml-1 B
- 7071.xml-1 B
- 7060.xml-1 B
- 28857.xml-1 B
- 8745.xml-1 B
- 36975.xml-1 B
- 7346.xml-1 B
- 8651.xml-1 B
- 21614.xml-1 B
- 22060.xml-1 B
- 19070.xml-1 B
- 36359.xml-1 B
- 19714.xml-1 B
- 7070.xml-1 B
- 4932.xml-1 B
- 21986.xml-1 B
- 43067.xml-1 B
- 21788.xml-1 B
- 6953.xml-1 B
- 55779.xml-1 B
- 8859.xml-1 B
- 7318.xml-1 B
- 7037.xml-1 B
- 28190.xml-1 B
- 8155.xml-1 B
- 45573.xml-1 B
- 34886.xml-1 B
- 7334.xml-1 B
- 45617.xml-1 B
- 20092.xml-1 B
- 56143.xml-1 B
- 39470.xml-1 B
- 5075.xml-1 B
- 33939.xml-1 B
- 11565.xml-1 B
- 3785.xml-1 B
- Name
- Henty_plaintext.zip
- Size
- 20.89 MB
- Format
- application/zip
- Description
- Plain text version of the Heny corpus with no metadata or tagging
- plaintextcorpus
- 8651.txt-1 B
- 21614.txt-1 B
- 22060.txt-1 B
- 19070.txt-1 B
- 36359.txt-1 B
- 19714.txt-1 B
- 7070.txt-1 B
- 4932.txt-1 B
- 21788.txt-1 B
- 21986.txt-1 B
- 43067.txt-1 B
- 6953.txt-1 B
- 55779.txt-1 B
- 8859.txt-1 B
- 7318.txt-1 B
- 7037.txt-1 B
- 28190.txt-1 B
- 8155.txt-1 B
- 45573.txt-1 B
- 34886.txt-1 B
- 7334.txt-1 B
- 45617.txt-1 B
- 20092.txt-1 B
- 56143.txt-1 B
- 39470.txt-1 B
- 5075.txt-1 B
- 33939.txt-1 B
- 11565.txt-1 B
- 3785.txt-1 B
- 4931.txt-1 B
- 6952.txt-1 B
- 2805.txt-1 B
- 18357.txt-1 B
- 19398.txt-1 B
- 7229.txt-1 B
- 22224.txt-1 B
- 8732.txt-1 B
- 26090.txt-1 B
- 30457.txt-1 B
- 38764.txt-1 B
- 20091.txt-1 B
- 18868.txt-1 B
- 11058.txt-1 B
- 36236.txt-1 B
- 19206.txt-1 B
- 3674.txt-1 B
- 4792.txt-1 B
- 11609.txt-1 B
- 35266.txt-1 B
- 33619.txt-1 B
- 9613.txt-1 B
- 20031.txt-1 B
- 20207.txt-1 B
- 59529.txt-1 B
- 21242.txt-1 B
- 21979.txt-1 B
- 6472.txt-1 B
- 8670.txt-1 B
- 8576.txt-1 B
- 18356.txt-1 B
- 20729.txt-1 B
- 17546.txt-1 B
- 48297.txt-1 B
- 53717.txt-1 B
- 18813.txt-1 B
- 12308.txt-1 B
- 24244.txt-1 B
- 17766.txt-1 B
- 30143.txt-1 B
- 20641.txt-1 B
- 17436.txt-1 B
- 39374.txt-1 B
- 49229.txt-1 B
- 35265.txt-1 B
- 42276.txt-1 B
- 54091.txt-1 B
- 13354.txt-1 B
- 53859.txt-1 B
- 14313.txt-1 B
- 7870.txt-1 B
- 17403.txt-1 B
- 25993.txt-1 B
- 7061.txt-1 B
- 32934.txt-1 B
- 36103.txt-1 B
- 5128.txt-1 B
- 56767.txt-1 B
- 35012.txt-1 B
- 31128.txt-1 B
- 18349.txt-1 B
- 39616.txt-1 B
- 29756.txt-1 B
- 28357.txt-1 B
- 7006.txt-1 B
- 7831.txt-1 B
- 47008.txt-1 B
- 18833.txt-1 B
- 56255.txt-1 B
- 7071.txt-1 B
- 7060.txt-1 B
- 28857.txt-1 B
- 8745.txt-1 B
- 36975.txt-1 B
- 7346.txt-1 B