This is a collection of the most frequently used research data management-related terms used at the University of Basel. Some definitions were retreived or adapted from the CODATA RDM Terminology overview and from the forschungsdaten.info glossary.

 

Accessibility

Ensuring that research data can be retrieved, used, and understood by authorized users under well-defined conditions. This includes proper documentation, metadata, adherence to FAIR principles (Findable, Accessible, Interoperable, Reusable), and compliance with legal or ethical restrictions.

Anonymized data

Data that have been stripped of all personally identifiable information, making re-identification impossible.

Archive

A repository used for the long-term, secure storage of data, particularly for historical or completed research, ensuring its availability for future reference or replication while maintaining its integrity. Archives are often a key part of long-term preservation strategies. An archive is curated and may not be publicly available.

Authentication

Access to certain data, systems or services must be restricted. Access control is regulated via authentication. The accessing person can be uniquely identified using various features: IP address, login and password, security feature (key file, biometric feature, hardware token) or a combination (two-factor authentication). This requires a functioning user administration/identity management (IDM), where password data etc. can be stored and managed. An alternative is the so-called single sign-on procedure, such as Shibboleth, where one person can use several services with one login. Authentication must be distinguished from authorization, in which the authenticated person is assigned certain rights to the system.

Backup

The backup of data is usually referred to as a backup or a backup copy and is used to restore the original data in the event of data loss. The 3-2-1 backup strategy states that three copies of the data should be maintained (1 original, 2 copies), the copies are stored on two different types of storage media and one copy of the data is sent offsite (i.e. separate from your primary data and on-site backups).

Best practice

Refers to a tried and tested method for carrying out a work process. It is a technique or methodology that has been proven through experience and research to be reliable in achieving a desired result. A commitment to use best practice in all areas is a commitment to use all available knowledge and technology to ensure successful implementation.

CARE principles

Set of principles for Indigenous data governance. CARE stands for Collective benefit, Authority to control, Responsibility and Ethics. These principles complement the existing FAIR principles.

Change log

Document, spreadsheet, or digital tool that tracks the progress of each change in a dataset, code or other research object.

Checksum

Alphanumeric signature (similar to a fingerprint) calculated from a digital object's content and structure using a mathematical algorithm. The algorithm will always produce the same checksum unless any change, no matter how small, is made to the file. Comparing checksums over time facilitates the management of integrity and authenticity of digital content.

Creative Commons license

Creative Commons licenses give everyone from individual creators to large institutions a standardized way to grant the public permission to use their creative work under copyright law. From the reuser’s perspective, a Creative Commons license on a copyrighted work allows them to reuse the work under certain conditions. There are six different license types that allow varying degrees of reuse. (For more information on the individual CC licenses see: https://creativecommons.org/share-your-work/cclicenses/). Materials released under CC0 (aka CC Zero) can be reused without restrictions.

Data citation

Process of citing a dataset in a similar manner to other research outputs. Ideally, the dataset is a standalone output that appears in a data repository, data paper or project website, and has a Persistent Identifier. It is possible to cite a dataset that is not publicly available and/or does not have a PID. Most current referencing systems provide a format for citing datasets.

Data curation

Managed process throughout the data life cycle, by which data/data collections are cleansed, documented, standardized, formatted and inter-related. This includes versioning data, or forming a new collection from several data sources, annotating with metadata, adding codes to raw data (e.g., classifying a galaxy image with a galaxy type such as “spiral”). Higher levels of curation involve maintaining links with annotation and with other published materials. The goal of curation is to manage and promote the use of data from its point of creation to ensure it is fit for contemporary purpose and available for discovery and reuse. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Special forms of curation may be available in data repositories. The data curation process itself must be documented as part of curation. Thus curation and provenance are highly related.

Data journal

In addition to the traditional scientific journals for articles describing and interpreting research results, there are also data journals in which articles are published that only describe data but do not interpret it. Such articles describe, for example, datasets that are particularly significant, comprehensive or complex. These data descriptions undergo a peer review process similar to research articles in traditional specialist journals and therefore meet a high quality standard. The data described in the article are only linked to the article and is published separately, ideally in a data repository.

Data Management Plan (DMP)

A DMP serves as the foundation for effective research data management. It outlines the entire life cycle of research data and is a living document designed for use throughout a project, which includes that it is regularly updated. The DMP details how data will be generated & collected, documented, stored, published, and archived, in other words how data will be handled both during the project as well as after the project has ended. A key component of a DMP is the description of research data following the FAIR principles. The DMP is considered to be a 'living' document, i.e., one which can be updated when necessary.

Data mining

Analyzing multivariate datasets using pattern recognition or other knowledge discovery techniques to identify potentially unknown and potentially meaningful data content, relationships, classification or trends.

Data processing

Any handling of data, irrespective of the means and procedures used, in particular the collection, storage, keeping, use, modification, disclosure, archiving, deletion or destruction of data. In the context of personal data, it is defined in Art. 3 (5), IDG BS: https://www.gesetzessammlung.bs.ch/app/de/texts_of_law/153.260.

Data protection

Data protection refers to technical and organizational measures to prevent the misuse of personal data and to protect the fundamental rights of natural persons (regulated in the Informations- und Datenschutzgesetz (IDG) des Kantons Basel-Stadt). In research, personal data is generated particularly in medical and social science studies. Encryption and storage in specially secured locations is absolutely essential here.

Data readability

Refers to the ability to process the information on a computer system or device other than the one that initially created the digital information or on which it is currently stored. Typically, non-readability involves some aspect of an older storage device (a tape or disk) that makes it physically incompatible with existing equipment. This hardware obsolescence occurs when storage devices and media used today become incompatible with those developed in the future.

Data steward

At the University of Basel, data stewards are people who offer subject-specific support for research data management. They are the first point of contact for researchers in their respective subject area. In addition to providing direct advice and support, they also serve as a link to other bodies that offer general support in the area of RDM. They ensure that subject-specific needs are known throughout the Research Data Management Network at the University of Basel and that transferable solutions can be developed. The data stewards offer advice and support, but it is not their duty to manage the research data in the research projects. Data stewards act as multipliers and ambassadors for RDM at the University of Basel. They inform their organizational units about relevant developments, services and requirements in the field of RDM and support them in establishing best practices and building up knowledge.

Data transfer

The movement of research data between systems, institutions, or collaborators, ensuring integrity, security, and compliance with regulations. Methods include encrypted file transfers, secure repositories, and controlled access platforms.

Dataset

A structured collection of related data (points), often organized in tables, spreadsheets, or databases. It can include raw or processed data and may contain qualitative, quantitative, or mixed data types, used for analysis and research.

Deletion (of data)

The process of permanently removing research data from storage systems to prevent recovery. This can include soft deletion (data marked for removal but still recoverable) or hard deletion (irreversible erasure), following institutional policies, legal requirements, and ethical considerations. Secure deletion methods may be required for sensitive data.

Digital Object Identifier (DOI)

Type of digital Persistent Identifier (PID) issued by the International DOI Foundation. This permanent digital identifier is associated with an object that permits the object to be referenced reliably even if its location and metadata undergo change over time.

Documentation

Enables the comprehensibility and reuse of research data for oneself and for others. The aim of documentation is to make stored information or documents findable and comprehensible by describing them. A widespread and simple way of documentation is adding a readme file to a dataset.

Electronic Lab Notebook (ELN)

An application that in its simplest form supports digital notetaking in the lab or in the field.  ELNs replace traditional paper lab notebooks by offering advantages such as searchability, accessibility, structured data capture and documentation, collaboration tools, compliance features, and integration with lab instruments, databases, and Laboratory Information Management Systems (LIMS).

End of project

The formal conclusion of a research project, marked by final data analysis, reporting, publication, and, where applicable, archiving or sharing of research data according to data management and preservation guidelines. End of project is a broad term. Examples include when the funding from the research sponsor expires and a project is formally completed; or the project ends once all publications and doctoral theses have been released.

FAIR principles

 

Set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable.

  • Findable: Data and metadata that can be located by humans and machines. They should be assigned a globally unique persistent identifier, be described with rich metadata, and ideally registered in a searchable catalogue or index.
  • Accessible: Data and metadata can be accessed or retrieved once it has been discovered. This includes data instances where access to the data is limited, such as when user requests need to be authenticated and authorized.
  • Interoperable: Data and metadata that can be integrated with other data/metadata and can interoperate with applications or workflows for analysis, storage, and processing.
  • Reusable: data that can be utilized to replicate research findings and/or can be analyzed in settings outside of the original context in which it was produced or collected. The reusability of research data can depend on its format, licensing, and the richness of the relevant metadata.

File format (also called data format or file type)

Is created when a file is saved and contains information about the structure of the data contained in the file, its purpose and affiliation. Application programs can use the information available in the file format to interpret the data and make the content available. With so-called proprietary formats, the files can only be opened, edited and saved with the corresponding application, utility or system programs (e.g. .doc/.docx, .xls/.xlsx). Open formats (e.g. .html, .jpg, .mp3, .gif), on the other hand, make it possible to open and edit the file with software from different manufacturers. File formats can be actively changed by conversion when saving, but this can result in data loss. In the scientific field, particular attention should be paid to compatibility, suitability for long-term archiving and loss-free conversion to alternative formats.

Health-related personal data

Information concerning the health or disease of a specific or identifiable person, including genetic data.

Integrity

The accuracy, consistency, and trustworthiness of research data throughout its life cycle. Ensuring data integrity involves preventing unauthorized changes, errors, or corruption, and maintaining data quality through secure storage, regular checks, and validation methods.

License

A license is a contractually agreed right of use. The rights holder thus authorizes his contractual partner to use a work in various ways (e.g. to copy, store or make it digitally accessible). Rights holders often charge a license fee for this. In addition to such commercial licenses, free licenses such as Creative Commons licenses are also available. These licenses allow the work to be used free of charge.

Logging

Refers to the systematic recording of events and activities during the data transfer process. This includes capturing details such as timestamps, source and destination addresses, file names, transfer status, and any errors encountered. Effective logging ensures transparency, aids in troubleshooting, and supports auditing by providing a detailed trail of actions taken during data transfers.

Long-term storage / preservation

The process of maintaining data for extended periods (often decades or more), ensuring its integrity, accessibility, and usability. Preservation strategies aim for indefinite storage, typically over 100+ years, using stable formats and secure infrastructures to safeguard the data for future generations.

Metadata standard

High level, shared representation of the metadata elements related to a dataset, collection, or other digital object. May also provide an XML schema describing the format in which the elements should be stored. Typically, a standard XML format is defined using XML Schema or document type definition (DTD). Standards are typically ratified by national or international standards bodies.

Metadata

Data that are structured, uniform, and machine-readable descriptions of data; it is therefore data about data. Metadata help with organization, discovery, and interpretation of data. Examples include author, creation date, file format, keywords, and data origin.

Mid-term storage / deep storage

Data stored for a period ranging from a few months to several years, for completed projects. Data are kept for research integrity or reuse. They are optimized for retrieval when needed, but not intended for indefinite retention.

Open Data

Open data refers to data that may be used, disseminated and reused by third parties for any purpose (e.g. for information, evaluation or even commercial reuse). The idea behind open data is that free reuse creates greater transparency and more collaboration. Open data and FAIR data are not the same. Data can be open but not FAIR if it lacks interoperability or clear licensing. Likewise, data can be FAIR but not open due to access restrictions. Ideally, data should be both.

Open Research and Contributor Identifier (ORCID)

Unique identifier for researchers, which is publisher-independent and can be used by researchers for their scientific output both permanently and independently of institutions. Disambiguates researchers and allows researchers to connect their ID with additional professional information including affiliations, grants, and publications.

Open Science

Scientific knowledge that is openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community.

Permission & access control

Mechanisms that regulate who can view, modify, or share research data. This includes role-based access, authentication, and authorization to ensure data security and compliance with legal or ethical requirements.

Persistent Identifier (PID)

Long-lasting digital reference to an object that gives information about that object regardless of what happens to that object. Developed to address link rot, a persistent identifier can be resolved to provide an appropriate representation of an object whether that object changes its online location or goes offline.

Personal data

Any information referring to a definite or definable natural person. (Art. 3 (3), IDG BS: https://www.gesetzessammlung.bs.ch/app/de/texts_of_law/153.260). (See also “Sensitive Personal Data”).

Processed data

Data that has been cleaned, transformed, or analyzed to derive meaning or insights.

Project

A structured research endeavor with defined objectives, methodology, and timeline, involving data collection, analysis, and dissemination of findings.

Pseudonymized data

Data where identifying details are replaced with pseudonyms, allowing re-identification with additional information.

Raw data

Unprocessed, original data collected from experiments, surveys, or observations.

Readme file

A common way to documentthe contents and structure of a folder and/or a dataset so that a researcher can locate the information they need. Provides a clear and concise description of all relevant details about data collection, processing, and analysis.

Repository

Physical or digital storage location that can house, preserve, manage, and provide access to many types of digital and physical materials in a variety of formats. Materials in online repositories are curated to enable search, discovery, and reuse. There must be sufficient control for the physical and digital material to be authentic, reliable, accessible and usable on a continuing basis. There are specialized repositories for publishing research data.

Research data

 

Any data collected, observed, or generated during research, serving as evidence for findings. It is material that can vary greatly depending on the discipline and research area and is created through various methods such as measurements, experiments, surveys, interviews, digitization and much more Types of data (examples):

  • Qualitative data: Non-numerical data, such as interviews, texts, or images, used to explore concepts and meanings.
  • Quantitative data: Numerical data, often analyzed statistically, including measurements, survey results, and computational outputs.
  • Code: Scripts, algorithms, or software used for data processing, analysis, or modeling in research.

Research Data Life Cycle

Entire period of time that research data exists. This life cycle describes the flow of research data starting from planning, collecting, processing, analyzing, preserving, sharing and finally reusing the research data. Research data often have a longer lifespan than the research project.

Research Data Management

Research data management practices cover the entire life cycle of the data, from planning the investigation to conducting it, and from backing up data as it is created and used to long term preservation of data deliverables after the research investigation has concluded. Specific activities and issues that fall within the category of data management include: file formats; data documentation; metadata creation; data quality control/assurance; data access and security; data storage; data archiving/preservation; data sharing and reuse.

Retention period

The length of time research data are kept before they are deleted. The retention period is often determined by legal, ethical, or institutional guidelines and can vary based on the type of data and its intended use.

Scientific Integrity (also called Research Integrity)

Scientific integrity is the adherence to professional practices, honesty, fairness, accountability, respect towards colleagues and responsibility towards society. The University of Basel expects its researchers to conduct research in line with the standards and principles concerning scientific and personal integrity as defined in the Code of Conduct and Integrity Regulations.

Sensitive data

Sensitive data refers to information that requires special protection due to legal, contractual, or critical confidentiality obligations. The term is ambiguous and not legally defined. In some cases, it is used synonymously with special categories of personal data, while in others, it extends beyond personal data to include business information, trade secrets, research data under embargo, classified information, or government secrets.

Sensitive personal data / special personal data (“besondere Personendaten”)

 

In accordance with Art. 3 (4), IDG BS:

  1. Personal data whose processing poses a particular danger of violating a fundamental right, in particular information about
    • religious, ideological, political or trade union opinions;
    • health, genetic background, personal privacy, sexual life, sexual orientation or ethnic origin;
    • social security measures;
    • administrative or criminal prosecutions and sanctions;
    • physical, physiological or behavioral characteristics of a natural person, obtained by using special technical procedures which enable or confirm the unique identification of that person (biometric data);
  2. compilations of information that enable the evaluation of crucial aspects of the personality of a natural person (personality profile)

Versioning

A method of marking work states during data processing to make any type of change traceable. A pre-defined, easy-to-understand versioning scheme (e.g. version 1.3 or version 2.1.4) should be used for this purpose. Data can be versioned either manually or using versioning software such as git. Versioning should be used during the research process itself, for example to identify different working versions of data, and also for subsequent changes to previously published research datasets to enable subsequent users to cite the correct version of a research dataset.