Creating & Documenting Electronic Texts

 

6.3 : Documentation and Metadata

6.3.1 The Dublin Core Element Set and the Arts and Humanities Data Service

"The Dublin Core is a 15-element metadata element set intended to facilitate discovery of electronic resources. Originally conceived for author-generated description of web resources, it has also attracted the attention of formal resource description communities such as museums and libraries" [Dublin Core Metadata home page - http://purl.oclc.org/metadata/dublin_core/]

By the mid-1990's large scale web users, document creators and information providers had recognized the pressing need to introduce some kind of basic cataloguing scheme for documenting resources on the web. The scheme needed to be accessible enough to be adopted and implemented by typical web content creators who had little or no formal cataloguing training. The set of metadata elements needed to be simpler than those used in traditional library cataloguing but which also offered information systems greater precision than the crude indexing methods already employed by unreliable search engines and web crawlers.

The Dublin Core Metadata Element Set grew out of a series of meetings and workshops comprising of experts from the library world, the networking and digital library research community, and content specialists. The basic objectives of the Dublin Core initiative included:

- to produce a core set of descriptive elements which would be capable of describing or identifying the majority of resources available on the internet. Unlike a traditional library where the main focus is on cataloguing published textual materials, the Internet contains a vast range or material in a variety of formats, including non-textual material such as images, video, most of which do have not been 'published' in any formal way.

- to make this scheme intelligible enough that it could be easily utilized by cataloguers but still retain enough content that it functioned effectively as a catalogue record.

- to encourage the adoption of the scheme on an international level by ensuring that it provided the best format for documenting digital objects on the web

The Dublin Core element set provides a straightforward framework for documenting features of a work such as who created the work, what its content is and what languages it contains, where and from whom it is available from and in what formats, and whether it derived from a printed source. At a basic level the element set uses commonly understood terms and semantics which are intelligible to most disciplines and information systems communities. The descriptive terms were chosen to be generic enough to be understood by a document author, but could also be extended to provide full and precise cataloguing information. For example textual authors, painters, photographers, writers of software programs can all be considered 'creators' in a broad sense.

Two main principles apply when creating a Dublin Core record, which are that all elements are optional and all elements are repeatable. Therefore if a work is the result of numerous contributors it is simple to record the details of each member (name, contact details etc) as well as their specific contribution (author, editor, photographer, etc) by simply repeating the appropriate element. These basic details can be extended by the use of qualifiers such a scheme, type, and language on the elements. The scheme qualifier identifies a recognized coding or cataloguing scheme used in a Dublin Core element, for example if a document employs an established cataloguing scheme such as the Library of Congress subject headings. The use of the scheme qualifier provides a mechanism to introduce a degree of standardization and consistency to the format. The type qualifier refines more precisely he content of a single element, for example the author element is often used several times for the same individual. The type qualifier can be employed to differentiate details such as the authors postal address, email address, telephone number, etc. The language element simply identifies the language of the element value.

Implementing the Dublin Core

The Dublin Core element set was designed for documenting web resources and it is easily integrated into web pages using the HTML tag, inserted between the ... tags and before the of the work. No specialist tools more sophisticated than an average word processor are required to produce the content of a Dublin Core record, however a number of labour saving devices are available, notably the DC-dot generator available from the UKOLN web site [http://www.ukoln.ac.uk/metadata/dcdot/]. The DC-dot will automatically generate the tags for any web site, which can be easily edited and extended further.

Conclusions and further reading

The Dublin Core element scheme offers enormous potential as a useable standard cataloguing procedure for digital resources on the web. The core set of elements are broad and encompassing enough to be of use to novice web authors and skilled cataloguers alike. However its success will ultimately be dependent on its wide-scale adoption by the internet community as a whole. It is also crucial that the rules of the scheme be implemented in an intelligent and systematic way. To fulfil this objective more has to be done to refine and stabalize the element set. The provision of simple Dublin Core generating tools, which demonstrate the benefits of including metadata, must become more prevalent.

The Arts and Humanities Data Service (AHDS), in association with the UK office for Library and Information Networking (UKOLN), has produced a publication which outlines in more detail the best practices involved in using Dublin Core, as well as giving many practical examples. "Discovering Online Resources across the Humanities: A practical implementation of the Dublin Core" (ISBN 0-9516856-4-3). This is available also freely available from the AHDS web site, [http://ahds.ac.uk/]

As a practical illustration of how the Dublin Core element set can be implemented in order to perform searches for individual items across disparate collections is the AHDS Gateway [http://ahds.ac.uk:8080/ahds_live/]. The AHDS Gateway is, in reality, an integrated catalogue of the holdings of the five individual Service Providers, which make up the AHDS. Although the Service Providers are separated geographically, by providing Dublin Core records describing each of their holdings, users can very simply search across the complete holdings of the AHDS from one single access point.

The Dublin Core Elements

This set of official definitions of the Dublin Core metadata element set can be found at: http://purl.oclc.org/metadata/dublin_core_elements

Element Descriptions 1.Title

Label: TITLE

The name given to the resource by the CREATOR or PUBLISHER. Where possible standard authority files should be consulted when entering the content of this element. For example the Library of Congress or British Library title lists can be used, but always remember to indicate the source using the 'scheme' qualifier. 2.Author or Creator

Label: CREATOR

The person or organization primarily responsible for creating the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. Note that this element does not refer to the person who is responsible for digitizing a work, this belongs in the CONTRIBUTOR element. So in the case of a machine- readable version of King Lear held by the OTA, the CREATOR remains William Shakespeare, and not the person who transcribed it into digital form. Again, standard authority files should be consulted for the content of this element. 3.Subject and Keywords

Label: SUBJECT

The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemas is encouraged. 4.Description

Label: DESCRIPTION

A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. 5.Publisher

Label: PUBLISHER

The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. 6.Other Contributor

Label: CONTRIBUTOR

A person or organization not specified in a CREATOR element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a CREATOR element (for example, editor, transcriber, and illustrator). 7.Date

Label: DATE

The date the resource was made available in its present form. Recommended best practice is an 8 digit number in the form YYYY-MM-DD as defined in http://www.w3.org/TR/NOTE-datetime, a profile of ISO 8601. In this scheme, the date element 1994-11-05 corresponds to November 5, 1994. Many other schema are possible, but if used, they should be identified in an unambiguous manner. 8.Resource Type

Label: TYPE

The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. For the sake of interoperability, TYPE should be selected from an enumerated list that is under development in the workshop series at the time of publication of this document. See

http://sunsite.berkeley.edu/Metadata/types.html for current thinking on the application of this element 9.Format

Label: FORMAT

The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. For the sake of interoperability, FORMAT should be selected from an enumerated list that is under development in the workshop series at the time of publication of this document. 10.Resource Identifier

Label: IDENTIFIER

String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element in the case of off-line resources. 11.Source

Label: SOURCE

A string or number used to uniquely identify the work from which this resource was derived, if applicable. For example, a PDF version of a novel might have a SOURCE element containing an ISBN number for the physical book from which the PDF version was derived. 12.Language

Label: LANGUAGE

Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with RFC 1766. See: http://ds.internic.net/rfc/rfc1766.txt 13.Relation

Label: RELATION

The relationship of this resource to other resources. The intent of this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. Formal specification of RELATION is currently under development. Users and developers should understand that use of this element is currently considered to be experimental. 14.Coverage

Label: COVERAGE

The spatial and/or temporal characteristics of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element is currently considered to be experimental. 15.Rights Management

Label: RIGHTS

A link to a copyright notice, to a rights-management statement, or to a service that would provide information about terms of access to the resource. Formal specification of RIGHTS is currently under development. Users and developers should understand that use of this element is currently considered to be experimental.

© 
The right of xxxx to be identified as the Authorsof this Work has been asserted by them in accordance with the Copyright,Designs and Patents Act 1988. 
All material supplied via the Arts and HumanitiesData Service is protected by copyright, and duplication or sale of allor part of any of it is not permitted, except that material may be duplicatedby you for your personal research use or educational purposes in electronicor print form. Permission for any other use must be obtained from the
Arts and HumanitiesData Service
Electronic or print copies may not be offered, whetherfor sale or otherwise, 
to any third party. 
Arts and Humanities Data Service 
 
A red line
Back Next Bibliography Glossary Contents