Data organisation

Folder

At the beginning of a project it is highly recommended to choose a logical and consistent file organisation, that allows you and others to easily locate, access and use your data, to avoid duplication, and ensure that your data can be backed up. The following tips may help you to develop your own organsation system. For further information follow the links besides.

  • Organize your data logically and store it in a hierarchical folder structure.
  • Separate active and completed work and delete any unused temporary files.
  • Make sure that the folder structure is not too nested, otherwise this will lead to long and complicated file paths.

Defining a naming convention for naming the folder structure and files is highly recommended. Useful are consistent file names, that are meaningful to you and your colleagues and allow you to find the file easily.

It will be helpful to include the following strings:

  • Project (abbreviation)
  • Author (whole name or initials)
  • Description of the content
  • Date (YYYYMMDD) / Version
  • Research Team / Department

Make sure that the terms are as short, clear and understandable as possible for outsiders. Please keep in mind that not all characters are allowed for folder and file naming. We recommend that you do not exceed a path length of 260 characters and use only the following characters:
(ABCDEFGHIJKLMNOPabcdefghijklmnop0123456789-_)

For more information see: https://wiki.biozentrum.unibas.ch/x/SQG3Aw (only for unibas members).

Careful selection of file formats can ensure that your files are easily accessible and interoperable and can still be used after many years. This may be particularly important in long-term research projects involving many people, or where staff could change during the research process.
A later archiving and the reuse of the data by third parties is considerably facilitated by the choice of a suitable format.

Aspects of a suitable Format:

  • No licenses (readable by open source/code software)
  • many software products can read the data format
  • No encryption or DRM
  • Established in community
  • open accessible documentation

Documentation is the utilization of information for further use. The aim of the documentation is to make it possible to find (permanently) stored information or documents. The structured information about an object determined by documentation is called metadata.

Metadata describes how the data originated and in what context it exists. It is recommended to define an internal project standard how the data will be annotated. Metadata pertaining to findability is required if the data is to be published in a repository. (see point 4.3). A further evaluation of the data in some cases may be useful for use in the project or by other researchers and the information retrieval. A content description can occur in form form of keywords, abstracts, etc.

Helpful links for documenting datasets: