Show simple item record

Arabic Speech Corpus

 
dc.contributor Nawar Halabi University of Southampton
dc.contributor.author Nawar Halabi
dc.date.accessioned 2018-07-27
dc.date.accessioned 2022-08-19T15:57:01Z
dc.date.available 2022-08-19T15:57:01Z
dc.date.created 2015
dc.date.issued 2016-06-09
dc.identifier ota:2561
dc.identifier.uri http://hdl.handle.net/20.500.14106/2561
dc.description.abstract The resource is a speech corpus, with digital audio files, text transcripts, and files containing time stamps of the phoneme boundaries. There are 1813 .wav files containing spoken utterances, 1813 .lab files containing text utterances, 1813 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. These files can be opened using Praat software. The file phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line. The file orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line. Orthography is in Buckwalter Format which is friendlier where there is software that does not read Arabic script. It can be easily converted back to Arabic.
dc.format.extent CollectionSound 5,444 files: ca. 1.3 GB
dc.format.medium Digital bitstream
dc.language Arabic
dc.language.iso ara
dc.publisher University of Oxford
dc.relation.ispartof Oxford Text Archive Core Collection
dc.rights Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/
dc.rights.label PUB
dc.subject.lcsh Linguistics
dc.subject.lcsh Linguistics analysis (Linguistics)
dc.subject.lcsh Speech--Synthesis
dc.subject.other Linguistic corpora
dc.subject.other Speech--Research
dc.title Arabic Speech Corpus
dc.type CollectionSound
has.files yes
branding Oxford Text Archive
branding Oxford Text Archive
files.size 2064444
files.count 3
relation.uri https://downloads.it.ox.ac.uk/ota-public/audio/2561.zip
otaterms.date.range 2000-present

This item is
Publicly Available
and licensed under:
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

 Files for this item

Icon
Name
2561.zip
Format
unknown
Description
Note
This file is hosted on an external server
URI
https://downloads.it.ox.ac.uk/ota-public/audio/2561.zip
 Download file

 Download all local files for this item (1.97 MB)

Icon
Name
arabic-speech-corpus-report.pdf
Size
1.4 MB
Format
PDF
 Download file
Icon
Name
orthographic-transcript.txt
Size
221.52 KB
Format
Text file
Description
Version of the work in plain text format
 Download file  Preview
 File Preview  
"ARA NORM  0002.wav" "waraj~aHa Alt~aqoriyru Al~aTHiy >aEad~ahu maEohadu >aboHaA^i haDabapi Alt~ibiti fiy Alo>akaAdiymiy~api AlS~iyniy~api liloEuluwmi sil >ano tasotamir~a darajaAtu AloHaraArapi wamusotawayaAtu Alr~uTuwbapi fiy Alo<irotifaAEi TawaAla haTHaA Aloqarono"
"ARA NORM  0003.wav" "mim~aA qado yu&ad~iy <ilaY taraAjuEi masaAHaAti Alo>anohaAri Alj~aliydiy~api waAnotiSHaAri Alt~aSaH~uri"
"ARA NORM  0004.wav" "waTHakara Alt~aqoriyru >ana taraAjuEa masaAHapi Alojaliydi yumokinu >ay~uxil~a bimuEad~alaAti <imodaAdaAti AlomiyaAhi sil liEadadK mino >anohaAri |soyaA Alr~a}iysiy~api Al~atiy tamobuEu mina AlohaDabapi"
"ARA NORM  0005.wav" "bayonahaA nahoraA yaluw wayaAnogotsiy sil fiy AlS~iyno"
"ARA NORM  0006.wav" "wafiy AlSH~awoTi Al^~aAniy AsotaEaAda baAriysu saAno jiyoramAnu musotawaAhu waHaq~aqa Alt~aEaAdula Eano Tariyqi AboraAhiymuwfiytoSH sil mino tasodiydapK AsotaEoSato EalaY AloHaAriso"
"ARA NORM  0007.wav" "yatama^~alu Alo<ibodaAEu Alofan~iy~u waAloHaDaAriy~u Al~aTHiy yakoSH . . .
										
Icon
Name
phonetic-transcipt.txt
Size
365.03 KB
Format
Text file
Description
Version of the work in plain text format
 Download file  Preview
 File Preview  
"ARA NORM  0002.wav" "sil w a r a' jj A H a tt A q r ii0' r u0 ll a * i0 < a E a' dd a h u0 m a' E h a d u0 < a b H aa' ^ i0 h A D A' b a t i0 tt i1' b t i0 f i0 l < a k aa d ii0 m ii0' y a t i0 SS II0 n ii0' y a t i0 l u0 l E u0 l uu0' m i0 sil < a' n t a s t a m i0' rr a d a r a j aa' t u0 l H a r aa' r a t i0 w a m u0 s t a w a y aa' t u0 rr U0 T UU0' b a t i0 f i0 l Ah i0 r t i0 f aa' E i0 T A' w A l a h aa' * a l q A' r n sil"
"ARA NORM  0003.wav" "sil m i0' mm aa q A' d y u0 < a' dd ii0 Ah i0 l aa t a r aa' j u0 E i0 m a s aa H aa' t i0 l < a n h aa' r i0 jj a l ii0 d ii0' y a t i0 w a n t i0 $ aa' r i0 tt A S A' HH u0 r i0 sil"
"ARA NORM  0004.wav" "sil w a * a' k a r a tt A q r ii0' r u0 Ah a n a t a r aa' j u0 E A m a s aa' H a t i0 l j a l ii0' d i0 y u0' m k i0 n u0 Ah a yy u0 x I0' ll a b i0 m u0 E a dd a l aa' t i0 < i0 m d aa d aa' t i0 l m i0 y aa' h i0 sil l i0 E a' d a d i1 m i0' n < A' n h A r i0 < aa' s y a rr a < ii0 s ii0' y a t i0 ll a t ii0 t a' m b a E u0 m i0 . . .
										

Show simple item record