At the beginning of a project it is highly recommended to choose a logical and consistent file organisation, that allows you and others to easily locate, access and use your data, to avoid duplication, and ensure that your data can be backed up. The following tips may help you to develop your own organsation system.
- Organize your data logically and store it in a hierarchical folder structure.
- Separate active and completed work and delete any unused temporary files.
- Make sure that the folder structure is not too nested, otherwise this will lead to long and complicated file paths.
Defining a naming convention for naming the folder structure and files is highly recommended. Useful are consistent file names, that are meaningful to you and your colleagues and allow you to find the file easily.
It will be helpful to include the following strings:
- Project (abbreviation)
- Author (whole name or initials)
- Description of the content
- Date (YYYYMMDD) / Version
- Research Team / Department
Make sure that the terms are as short, clear and understandable as possible for outsiders. Please keep in mind that not all characters are allowed for folder and file naming. We recommend that you do not exceed a path length of 260 characters and use only the following characters:
Careful selection of file formats can ensure that your files are easily accessible and interoperable and can still be used after many years. This may be particularly important in long-term research projects involving many people, or where staff could change during the research process.
A later archiving and the reuse of the data by third parties is considerably facilitated by the choice of a suitable format.
Aspects of a suitable format:
- No licenses (readable by open source/code software)
- many software products can read the data format
- No encryption or DRM
- Established in community
- open accessible documentation
In order for others to be able to understand the data, it is also important to clearly document and describe the data, its origin and organisation.
Documentation enables the comprehensibility and re-use of research data for oneself and for others. The aim of documentation is to make stored information or documents findable and comprehensible by describing them. An important element of documentation is metadata.
Metadata are structured, uniform and machine-readable descriptions of data or an object; it is therefore data about data. Metadata are therefore an important element in the documentation of research data. There are the following types of metadata:
- Technical metadata (e.g. size and resolution of an image file)
- Descriptive metadata (title, author, keywords)
- Administrative metadata (rights, licences, publication date)
- Relational metadata (reference to other datasets or the related publication)
Documentation and metadata should already be recorded continuously during the research process, this is also part of good scientific practice. Especially in large and/or long-term projects, it is advisable to define in an internal project standard how data are named, filed and annotated, agreeing on a uniform and ideally standardised vocabulary. This also includes meaningful and uniform naming of files and relevant information in the individual documents. There are various methods and tools for documenting research data: Codebooks, electronic lab notebooks, scientific record keeping, readme files, indexing, etc. Which method is most appropriate is discipline-specific and each project can choose an adequate approach for itself.
When publishing and archiving research data, care must be taken to ensure that the data is adequately documented and that the documentation is also accessible. Only in this way the data can be found and used in a meaningful way. The documentation makes the data identifiable and states by whom, with which methods and in which context the data was generated, how it was processed and under which conditions it can be accessed and reused by others.
To ensure that similar data are described as uniformly as possible in terms of both content and structure, there are standards that regulate the specification through metadata. A metadata standard allows metadata from different sources to be linked and processed together.
Depending on the research community, there are individual metadata standards that meet the needs of the respective subjects.
Below you will find links to further information and lists of metadata standards for all disciplines and resource types.
Controlled vocabularies prescribe the use of predefined, authorised terms and are used to describe the content of research data by keywords (indexing). Thus, the problems of homographs, synonyms and polysemes are solved by a bijection between concepts and authorised terms. This serves to provide consistency and to reduce ambiguity that occurs in normal human languages, where the same concept can be given different names (or vice versa).
Controlled vocabularies thus increase the accuracy of a free-text search, since irrelevant elements (false positives) in a search list are often caused by the inherent ambiguity of natural language. Recallability is also improved because, unlike natural language schemas, there is no need to search for other terms that may be synonyms of that term. Furthermore, if the same vocabularies are used in different projects of a research discipline, this enables interoperability.
An overview of freely available vocabularies, both interdisciplinary and subject-specific, can be found at bartoc.org.