CORPUS OF BRAZILIAN MEDIA PORTUGUESE ==================================== Information file Version of October 10 1994 This is a collection of news stories from Brazilian newspapers,magazines and television as they were distributed on the Internet. Most files are organized by the date of publication/broadcast and named ' s day month year '; thus s110194.txt means news from the 11th of January, 1994. When the figures for day are zeros, the file so named refers to several successive days, as in s000894.txt, which contains several individual files from the month of August 1994. Files have been archived (with tar) and compressed (with gzip) so as to fit on 1.4MB diskettes: file tar bytes tar gzip bytes disk1.taz 3,333,120 1,203,004 disk2.taz 3,221,504 1,170,502 Because the files contain a number of words in email headings which are not words of Portuguese, only a fair estimate of the size of the text collection is possible. It is estimated that each disk contains about 400,000 to 450,000 usable words. Accentuation is not represented consistently. Only final acute accents appear more frequently as a single inverted comma.Spelling is also incorrect sometimes. Disclaimer ========== Since the source material for this collection has been distributed to the public on the Internet, the depositor cannot be held responsible for infringements to copyright laws. The responsibility for clearing copyright permissions rests with the users of this material and not with the depositor. The depositor cannot guarantee the authenticity of the texts either.