Show simple item record

Serbo-Croatian text corpus

 
dc.contributor Moerk, Henning
dc.contributor.author Collections, corpora etc
dc.date.accessioned 2018-07-27
dc.date.accessioned 2022-08-19T14:54:47Z
dc.date.available 2022-08-19T14:54:47Z
dc.date.created unknown
dc.date.issued 1992-07-28
dc.identifier ota:1700
dc.identifier.uri http://hdl.handle.net/20.500.14106/1700
dc.description.abstract Modern Yugoslav fiction
dc.format.extent Text data A unspecified offline
dc.format.medium Digital bitstream
dc.language English
dc.language.iso eng
dc.publisher University of Oxford
dc.relation.ispartof Oxford Text Archive Core Collection
dc.rights Distributed by the University of Oxford under a Attribution-NonCommercial-NoDerivatives 3.0 International License.
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/
dc.rights.label PUB
dc.title Serbo-Croatian text corpus
dc.type Text
has.files yes
branding Oxford Text Archive
branding Oxford Text Archive
files.size 8727
files.count 2

This item is
Publicly Available
and licensed under:
Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

 Files for this item

 Download all local files for this item (8.52 KB)

Icon
Name
yucorpdoc-1700.txt
Size
2.86 KB
Format
Text file
Description
Version of the work in plain text format
 Download file  Preview
 File Preview  
YU-CORPUS  (Serbo-Croatian text corpus)

  This is a text corpus consisting of approximately 700 000 words of Serbo-
Croatian. The texts are taken from modern (i.e. primarily post-World War II)
Yugoslav fiction and all Serbo-Croatian-speaking areas are represented:
Serbia, Croatia, Montenegro, and Bosnia-Hercegovina.

  The corpus was compiled by scanning books of fairly high printing quality
(one of the parameters of text selection, I must admit). My equipment was a
Macintosh computer with 4 Mb of RAM and a (French) OCR program called
AutoREAD.

  Each file consists of prose work(s) by an author who can be identified
by the file name. All text files are zipped and must thus be transferred in
binary mode and unzipped before use. The files are of approximately equal
size, namely about 300 000 bytes/50 000 words.

  The texts are (when unzipped) pure ASCII (8 bits) texts. They are all
in the Latin alphabet - even when the book was printed in Cyrillic.
I use the texts with Nota Bene's wor . . .
										
Icon
Name
yu-index-1700.txt
Size
5.67 KB
Format
Text file
Description
Version of the work in plain text format
 Download file  Preview
 File Preview  
-------------------------------------------
YU-CORPUS index  (yu-corp.txt):
-------------------------------------------
June, 1992:  4242310 bytes
              728952 words
-----------------------------------------
BOZOVIC.ZIP

Sasa Bozovic:
Ratne ljubavi, Beograd 1985, s. 5-110
Tebi, moja Dolores, Beograd 1984, s. 9-95

bytes:       293917
entries:        919
keywords:     11437
words         51189
-----------------------------------------
ISAKOV.ZIP

Antonije Isakovic:
Tren 2 - Kazivanja Ceperku
Beograd (prosveta) 1983
s. 7-214; 224-241

Bytes:      305703
Entries:      2654
Keywords:    13553
Words:       52496
-----------------------------------------
KAPOR.ZIP

Momo Kapor: Una
Zagreb (Znanje) 1983
s. 5-100

Momo Kapor: Zoe
Zagreb (Znanje) 1984
s. 5-141

Politika, 16. jul 1989, s.9: 011
Momo Kapor:
"Bre"

Politika,  28., 29. og 30. november: 011
Momo Kapor:
"Na dan tvog rodjenja"

Bytes:     292722
Entries:     1740
Keywords:   13966
Words:      47587
----------------------------- . . .
										

Show simple item record