Frequently Asked Questions
- What is the LLDS?
- What is the relationship to the Oxford Text Archive?
- What is the repository?
- Where is the LLDS?
- When did the LLDS start?
- Is the LLDS part of the University of Oxford?
- Is the LLDS part of the Oxford University Press?
- Is the LLDS part of CLARIN?
- What are the different dates that I can use to search for texts?
- How do I download a file rather then opening it in my browser?
- Why can’t I log in?
- What submissions do we accept?
- Do I need to create an account to download and/or make a submission?
- I see an error logging in.
- Why should I submit my data into your repository?
- What is the PID (handle) good for?
- What is the procedure for depositing and archiving data?
- What if I want or need to update the archived data?
- What if I want to withdraw the resources in the future? Can I delete the data?
- How should I cite resources?
- How safe is my data if I store it with you?
- What licence should I pick for my data?
- Where can I find more information about supported licences?
- How do I get the most of my searches?
What is the LLDS?
The Literary and Linguistic Data Service (LLDS) is a repository of digital literary and linguistic resources for research and teaching. The LLDS was created at and remains a part of the University of Oxford. We also offer advice to resource creators about best practice for creating digital resources, and to users of digital resources. LLDS is funded by the Arts and Humanities Research council as a national repository for literary and linguistic resources as part of the Infrastructure of Digital Arts and Humanities programme, and as the coordinating node for CLARIN-UK.
What is the relationship to the Oxford Text Archive?
The Oxford Text Archive was for many years at home in Oxford University Computing Services, which was then renamed IT Services, and then the OTA moved to the Bodleian Libraries in 2016. In 2021 the Bodleian Libraries decided that it would no longer collect new resources, and that it didn't want to be part of national and European infrastructures. The LLDS was set up to continue these activities, as well as continuing to deliver and add to OTA collections.
What is the “repository”?
It is like a library for digital research data, as well as for some other types of literary and linguistic data. It’s an open, online location where people can search for texts and easily download them. It’s a place where texts can be stored safely and shared with others. The goals are to make it easier to find and use the texts, and to make sure that they remain available and usable for a long time into the future.
Where is the LLDS?
The LLDS is part of the Faculty of Linguistics, Philology and Phonetics at the University of Oxford, and the staff responsible for it can be found at 41 Wellington Square in the centre of Oxford. In 2025, they plan to the move to the new Stephen A. Schwarzman Centre for the Humanities building.
When did the LLDS start?
The OTA was founded by Lou Burnard and Susan Hockey in 1976. We celebrated our 30th birthday with a number of events in 2006, and our fortieth in 2016. In 2021 the Literary and Linguistic Data Service was set up as a new home for the OTA collections, to continue the mission of providing a home for the deposit of new resources from the community, and as the CLARIN repository for the UK.
Is the LLDS part of the University of Oxford?
Yes.
Is the LLDS part of the Oxford University Press?
No.
Is the LLDS part of the Bodleian Libraries?
No.
Is the LLDS part of CLARIN?
Yes. The OTA was one of the key centres that originally planned and started CLARIN, the European Research Infrastructure Consortium for language resources and technologies. The LLDS collections can be found via the Virtual Language Observatory, and the LLDS is involved in a number of initiatives to share resources via CLARIN. The LLDS is a registered CLARIN C Centre, and the repository is based on the CLARIN DSpace platform.
What are the different dates that I can use to search for texts?
Date of publication
This represents, as far as possible, the date when the content of the resource was created. So, for digitized versions of printed works, this is usually the date of original publication. The value of this element is taken from the “dc.date.created” element, which can be seen in the “full item record” view.
Date of digitization
The date of the creation of the digital resource. If it is born digital, then this is the same as the date of publication, but more often it will be a later date. For legacy resources, where we don’t have sufficient information about exactly when a resource was created, the date when it was deposited in the LLDS is used in this field, and its value should be interpreted as "created no later than". The value of this element is taken from the “dc.date.issued” element, which can be seen in the “full item record” view.
Date range
This is based on the date of publication, and gives a coarse-grained date range for the resource, for the purposes of grouping resources into larger periods of time. Where the date of publication of a resource is a range, the earlier date is used to assign a date range value. The value of this element is taken from the “otaterms.date.range” element, which can be seen in the “full item record” view.
How do I download a file rather than opening it in my browser?
Different browsers will behave differently when it comes to downloading or opening different file types. To make sure that a file is downloaded to your machine, right-click on the “Download file” button and select the option to save the file. Or, if you click on the darker blue “Download all local files for this item” button, then that should always download them all in a zip file.
Why can’t I log in?
This may be a problem for some users attempting to use some resources in the LLDS. Resources marked as being for “Academic use” are only accessible to bona fide members of a university: users must log in via their institution to demonstrate their credentials. (The restrictions are imposed by those who created and deposited the resources in question.)
Access will be granted to log-ins from institutions in countries currently signed up to the CLARIN Federation, or to institutions that have signed up to eduGAIN. If your institution doesn’t show up in the list, you’ll need to ask someone (probably in your institutional library) to register your institution.
What submissions do we accept?
We accept high-quality digital texts and related resources: full text digital editions, corpora, lexicons, etc..
When uploading language resources, please use formats that we recommend for data submission.
Do I need to create an account to download or make a submission?
- Download without restriction: data with a licence allowing for free sharing can be downloaded without restriction — just read the licence and download. This applies to all data with Creative Commons licencing and tools with open source licences.
- Download with licence restrictions: To download datasets that require you to sign a licence, you need to log in — if you are from the academic world in Europe, you probably don’t need a new account; just click "Login" and search for your academic institution. To sign in, you can use any account with an Identity Provider that is a member of EduGAIN federation. If you don’t have an academic account that works with us, please contact the LLDS Help Desk.
I see an error message when I try to log in
If your institution is eligible for access but you have trouble logging in please contact the LLDS Help Desk.
Occasionally (usually when you are the first one logging in using your home institution) you might see an error stating:
- The authentication was successful; however, your identity provider did not provide either your email, eppn nor targeted id.
This means your home institution did not send the LLDS enough data about you to operate our service (probably to protect your personal data). We only require an email address to provide access, which they should provide as we follow the GÉANT Data Protection Code of Conduct.
If you have an account with multiple providers, and you login with different one each time, you might see error stating:
- Your email is already associated with a different user.
Please try to use the same provider each time. If that is not possible, contact the LLDS Help Desk to request a change of your default address.
Why should I submit my data into your repository?
- It is free and safe.
- We respect your licence. We encourage open data, and believe it benefits not only users, but also the data providers.
- We also accept restricted access data, in which case we can require users to sign a licence, if that is what you need, before allowing data downloading.
- Deposited data is highly visible — for example via Google, VLO, DataCite, OLAC, Data Citation Index, arXive.org — giving you maximal exposure for your work.
- The data is easy to cite. We provide ready-to-use one-click citations in BibTex, RIS, and other popular reference formats. All the citations include permanent links created from persistent identifiers: we use handles for PIDs, and these PIDs are future-proof.
What is the PID (handle) good for?
It is a special permanent URL. It provides a permanent link that will resolve correctly even if in some distant future the data is moved: therefore, the PID should always be used as the URL of choice in citations.
How should I cite resources?
See our citation policies.
How safe is my data, if I store it with the LLDS?
We constantly review our data preservation policies to ensure that all data are preserved for the long term. As well as the live copy:
- all data in the repository have an on-site backup copy;
- all data in the repository have another off-site copy.
What licence should I pick for my data/tool?
We encourage using a free and open licence. A representative selection of free licences (including Creative Commons licences appropriate for datasets) is available during submission.
Where can I find more information about supported licences?
See our list of currently Available licences. However, do not hesitate to Contact Us in case you need a specific licence not on the list. (Licences can be accompanied by various, additional requirements.). The set of licences is currently under review (in 2022).
How do I get the most out of my searches?
The search engine is SOLR, which uses “OR” as the default operator on multiple terms in a search (for more on SOLR syntax, see the SOLR documentation).
If you are not satisfied with the results of your searches, you might wish to go beyond plain-text searches. You may search only in certain fields, use negation, add score (emphasis) to some parts of the query and match more.
Examples of search queries
If searching on the two terms “national” and “corpus”: SOLR inserts an implicit “OR” between the terms; Google inserts an implicit “AND” between the terms:
- national corpus
- SOLR searches for all examples of “national” as well as all examples of “corpus” in all text fields; Google searches for all examples of “national corpus”.
- dc.title:B?C && -dc.title:corpus
- Returns all items having “B?C” in title — “?” stands for any character (eg. BNC) — and not having “corpus” in the title
- dc.title:"National Corpus"
- Use double quotes (") for exact matches and multiword expressions