Skip to Main Content

Research Data Management

This guide will bring together services and resources that you can use to help you manage and document your research outputs and data at key stages of the research cycle.

FAIR Data

It's easy to aspire to making your data FAIR - Findable, Accessible, Interoperable, Reusable - but it's more important to know how that can be achieved. Below we'll share some examples of how your data could be more FAIR:

Findable

  1. Storage: We're sure you are already aware of how important it is to securely store and back up your data. In doing so, you avoid losing information due to accidents, human error, or technological obsolescence. Here's what you should do:

  2. Persistent Identifiers: A persistent or unique identifier is something that can be used to link directly to your project or your research profile. Examples of persistent or unique identifiers include: a personal ORCID ID for a researcher, a DOI for a journal article, a URL for a website, an ROR for a research institution.
  3. Rich metadata: Providing as much descriptive information as possible about the data in question; everything from the date of creation, to the date of publication, to the language, location, and size of the data. The more detail you provide, the better. If your research data is complex, consider including a README document which further explains the data - its content, structure, and the logic behind its naming and organisation. 
  4. Indexed data repositories: Uploading or depositing your research data into a repository for open access. Repositories are indexed allowing for further findability as your data is categorised by subject and by other specifications. Your research data could be deposited in an institutional repository such as TARA, for example, where you can add subject tags. Other research repositories include Zenodo, Figshare, and Slideshare.

Accessible

  1. Open, free data: Making sure your data is freely available and as open as possible. The general rule of thumb is that your work should be “as open as possible, as closed as necessary”. This can mean not applying any paywall, not signing a restrictive copyright transfer agreement that would limit access to the work, and applying a Creative Commons copyright license that you feel is appropriate for the data.
  2. Authentication, where necessary: This means that, where the data cannot, or should not, be made fully open access, authentication and authorisation are in place where necessary. “Data can be sensitive due to privacy concerns, national security or commercial interests. When it’s not able to be open, there should be clarity and transparency around the conditions governing access and reuse” (Australian Research Data Commons, 2022).
  3. Metadata is always available: Making metadata, or in other words, data about the data, available, even if you cannot provide full access to the research data itself. This allows others to access as much of the research process as possible. For example, uploading an abstract, sharing a sample, uploading a document with embargoed (delayed) access options, or simply providing the metadata alone.

Interoperable

  1. Vocabularies: Using language suitable for describing the data in question. Different types of research data are best described using different terms. Controlled vocabularies and community standards are recommended when describing an object or piece of data. In a repository like TARA, for example, Library of Congress (LoC) subject headings are used to describe items. LoC subject headings are an example of a controlled vocabulary and are used as a standard across various library catalogues.
  2. Human and Machine-readable language: Making sure your project can be understood by humans and computers by adhering to documentation standards. XML (.xml) is recommended as an alternative to Microsoft Word (.doc), for example, because Word documents are notoriously difficult for machines to process and interpret.
  3. Linked Metadata: Making sure that related metadata is referenced and easily accessed so that you are providing a fuller picture of where the data came from, and how it came to be produced. For example, providing DOIs in your reference list at the end of a research article or report.

Reusable

  1. Descriptive, detailed metadata: In order for your data to be easily reused, it has to be properly described. For example, if you provide incomplete information in a reference, you are potentially creating a barrier to more people accessing that data. This is another instance in which persistent identifiers (such as DOIs, ORCID) and usage licenses are essential, as they facilitate the process of finding and reusing the information. Be sure to consider these in your data management plan.
  2. Usage license: You ultimately have control over the use of your research data. The Creative Commons licenses are most popularly used to communicate how you do or don’t want your research data to be used. It is recommended to choose options that make your research data as open as possible.
  3. Community Standards: When in doubt, always follow best practices in your area of research. Use standard vocabularies, accessible platforms, and adhere to research standards such as appropriate documentation and referencing.

Metadata - Describing your data

Metadata describes the who, what, when, where, why, and how of your data in the context of your research and should provide enough information so that users know what can and cannot be done with your data.

Describing your data

Metadata can include content such as:

  • contact information,
  • geographic locations,
  • abbreviations or codes used in the dataset,
  • survey tool details,
  • provenance,
  • version information,
  • and much more.

You may need to describe several facets of your data, including:

  • overall bibliographic information about the dataset (e.g. title, author, related publications)
  • types of files used (e.g., csv, txt, png, etc.)
  • key descriptive information about the experiment, (e.g. sampling methods, software used for analysis, any processing or transformations performed)

Commonly used data formats may be available in your field that help capture and structure relevant metadata. When possible, structure your metadata using an appropriate, agreed-upon metadata standard format (see below for examples and guidelines). When no appropriate metadata standard exists, you may consider composing a "readme" style metadata document, as described in this guide.

Metadata formats and standards

Specific disciplines, repositories or data centres may guide or even dictate the content and format of metadata, possibly using a formal standard. Some standards describe general information such as bibliographic metadata, others describe specific data types or are designed for specific disciplines. Some examples of metadata standards are listed below.

To find an appropriate metadata standard for your discipline, try one of these guides:

Examples of different metadata standards:

  • Dublin Core - domain agnostic, basic and widely used metadata standard (This is used on the institutional repository, TARA)
  • DDI (Data Documentation Initiative) - common standard for social, behavioral and economic sciences, including survey data
  • EML (Ecological Metadata Language) - specific for ecology disciplines

Related information

Source: CC-BY Creative Commons Attribution 4.0 International License: Cornell University