In order for others to be able to understand the data, it is also important to clearly document and describe the data, its origin and organisation.
Documentation enables the comprehensibility and re-use of research data for oneself and for others. The aim of documentation is to make stored information or documents findable and comprehensible by describing them. An important element of documentation is metadata.
Metadata are structured, uniform and machine-readable descriptions of data or an object; it is therefore data about data. Metadata are therefore an important element in the documentation of research data. There are the following types of metadata:
- Technical metadata (e.g. size and resolution of an image file)
- Descriptive metadata (title, author, keywords)
- Administrative metadata (rights, licences, publication date)
- Relational metadata (reference to other datasets or the related publication)
Documentation and metadata should already be recorded continuously during the research process, this is also part of good scientific practice. Especially in large and/or long-term projects, it is advisable to define in an internal project standard how data are named, filed and annotated, agreeing on a uniform and ideally standardised vocabulary. This also includes meaningful and uniform naming of files and relevant information in the individual documents. There are various methods and tools for documenting research data: Codebooks, electronic lab notebooks, scientific record keeping, readme files, indexing, etc. Which method is most appropriate is discipline-specific and each project can choose an adequate approach for itself.
When publishing and archiving research data, care must be taken to ensure that the data is adequately documented and that the documentation is also accessible. Only in this way the data can be found and used in a meaningful way. The documentation makes the data identifiable and states by whom, with which methods and in which context the data was generated, how it was processed and under which conditions it can be accessed and reused by others.
To ensure that similar data are described as uniformly as possible in terms of both content and structure, there are standards that regulate the specification through metadata. A metadata standard allows metadata from different sources to be linked and processed together.
Depending on the research community, there are individual metadata standards that meet the needs of the respective subjects.
See below to find links to further information and lists of metadata standards for all disciplines and resource types.
Controlled vocabularies prescribe the use of predefined, authorised terms and are used to describe the content of research data by keywords (indexing). Thus, the problems of homographs, synonyms and polysemes are solved by a bijection between concepts and authorised terms. This serves to provide consistency and to reduce ambiguity that occurs in normal human languages, where the same concept can be given different names (or vice versa).
Controlled vocabularies thus increase the accuracy of a free-text search, since irrelevant elements (false positives) in a search list are often caused by the inherent ambiguity of natural language. Recallability is also improved because, unlike natural language schemas, there is no need to search for other terms that may be synonyms of that term. Furthermore, if the same vocabularies are used in different projects of a research discipline, this enables interoperability.
An overview of freely available vocabularies, both interdisciplinary and subject-specific, can be found at bartoc.org.
- Graduate Institute of Geneva: Documenting Data
- UCI: Guide to Writing a "readme" File
- Janet Clark: Scientific Record Keeping (Slides)
- NIH: Guide for Keeping Laboratory Records
- DCC: What are Metadata Standards?
- DCC: Metadata standards by disciplines
- RDA: Metadata standards by discipline
- DCC: Metadata standards by resource type
- BARTOC: Basic Register of Thesauri, Ontologies and Classification