Arabic Speech Corpus
dc.contributor | Nawar Halabi University of Southampton |
dc.contributor.author | Nawar Halabi |
dc.date.accessioned | 2018-07-27 |
dc.date.accessioned | 2022-08-19T15:57:01Z |
dc.date.available | 2022-08-19T15:57:01Z |
dc.date.created | 2015 |
dc.date.issued | 2016-06-09 |
dc.identifier | ota:2561 |
dc.identifier.uri | http://hdl.handle.net/20.500.14106/2561 |
dc.description.abstract | The resource is a speech corpus, with digital audio files, text transcripts, and files containing time stamps of the phoneme boundaries. There are 1813 .wav files containing spoken utterances, 1813 .lab files containing text utterances, 1813 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. These files can be opened using Praat software. The file phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line. The file orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line. Orthography is in Buckwalter Format which is friendlier where there is software that does not read Arabic script. It can be easily converted back to Arabic. |
dc.format.extent | CollectionSound 5,444 files: ca. 1.3 GB |
dc.format.medium | Digital bitstream |
dc.language | Arabic |
dc.language.iso | ara |
dc.publisher | University of Oxford |
dc.relation.ispartof | Oxford Text Archive Core Collection |
dc.rights | Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ |
dc.rights.label | PUB |
dc.subject.lcsh | Linguistics |
dc.subject.lcsh | Linguistics analysis (Linguistics) |
dc.subject.lcsh | Speech--Synthesis |
dc.subject.other | Linguistic corpora |
dc.subject.other | Speech--Research |
dc.title | Arabic Speech Corpus |
dc.type | CollectionSound |
has.files | yes |
branding | Oxford Text Archive |
branding | Oxford Text Archive |
files.size | 2064444 |
files.count | 3 |
relation.uri | https://downloads.it.ox.ac.uk/ota-public/audio/2561.zip |
otaterms.date.range | 2000-present |
This item is
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Publicly Available
and licensed under:Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Files for this item
- Name
- 2561.zip
- Format
- unknown
- Description
- Note
- This file is hosted on an external server
- URI
- https://downloads.it.ox.ac.uk/ota-public/audio/2561.zip
Download all local files for this item (1.97 MB)
- Name
- orthographic-transcript.txt
- Size
- 221.52 KB
- Format
- Text file
- Description
- Version of the work in plain text format
"ARA NORM 0002.wav" "waraj~aHa Alt~aqoriyru Al~aTHiy >aEad~ahu maEohadu >aboHaA^i haDabapi Alt~ibiti fiy Alo>akaAdiymiy~api AlS~iyniy~api liloEuluwmi sil >ano tasotamir~a darajaAtu AloHaraArapi wamusotawayaAtu Alr~uTuwbapi fiy Alo<irotifaAEi TawaAla haTHaA Aloqarono" "ARA NORM 0003.wav" "mim~aA qado yu&ad~iy <ilaY taraAjuEi masaAHaAti Alo>anohaAri Alj~aliydiy~api waAnotiSHaAri Alt~aSaH~uri" "ARA NORM 0004.wav" "waTHakara Alt~aqoriyru >ana taraAjuEa masaAHapi Alojaliydi yumokinu >ay~uxil~a bimuEad~alaAti <imodaAdaAti AlomiyaAhi sil liEadadK mino >anohaAri |soyaA Alr~a}iysiy~api Al~atiy tamobuEu mina AlohaDabapi" "ARA NORM 0005.wav" "bayonahaA nahoraA yaluw wayaAnogotsiy sil fiy AlS~iyno" "ARA NORM 0006.wav" "wafiy AlSH~awoTi Al^~aAniy AsotaEaAda baAriysu saAno jiyoramAnu musotawaAhu waHaq~aqa Alt~aEaAdula Eano Tariyqi AboraAhiymuwfiytoSH sil mino tasodiydapK AsotaEoSato EalaY AloHaAriso" "ARA NORM 0007.wav" "yatama^~alu Alo<ibodaAEu Alofan~iy~u waAloHaDaAriy~u Al~aTHiy yakoSH . . .
- Name
- phonetic-transcipt.txt
- Size
- 365.03 KB
- Format
- Text file
- Description
- Version of the work in plain text format
"ARA NORM 0002.wav" "sil w a r a' jj A H a tt A q r ii0' r u0 ll a * i0 < a E a' dd a h u0 m a' E h a d u0 < a b H aa' ^ i0 h A D A' b a t i0 tt i1' b t i0 f i0 l < a k aa d ii0 m ii0' y a t i0 SS II0 n ii0' y a t i0 l u0 l E u0 l uu0' m i0 sil < a' n t a s t a m i0' rr a d a r a j aa' t u0 l H a r aa' r a t i0 w a m u0 s t a w a y aa' t u0 rr U0 T UU0' b a t i0 f i0 l Ah i0 r t i0 f aa' E i0 T A' w A l a h aa' * a l q A' r n sil" "ARA NORM 0003.wav" "sil m i0' mm aa q A' d y u0 < a' dd ii0 Ah i0 l aa t a r aa' j u0 E i0 m a s aa H aa' t i0 l < a n h aa' r i0 jj a l ii0 d ii0' y a t i0 w a n t i0 $ aa' r i0 tt A S A' HH u0 r i0 sil" "ARA NORM 0004.wav" "sil w a * a' k a r a tt A q r ii0' r u0 Ah a n a t a r aa' j u0 E A m a s aa' H a t i0 l j a l ii0' d i0 y u0' m k i0 n u0 Ah a yy u0 x I0' ll a b i0 m u0 E a dd a l aa' t i0 < i0 m d aa d aa' t i0 l m i0 y aa' h i0 sil l i0 E a' d a d i1 m i0' n < A' n h A r i0 < aa' s y a rr a < ii0 s ii0' y a t i0 ll a t ii0 t a' m b a E u0 m i0 . . .