Skip to Main Content

Research Data Management

This guide will bring together services and resources that you can use to help you manage and document your research outputs and data at key stages of the research cycle.

Standard and Recommended File Formats

Below are some examples of the different types of data you might generate during your research, and how best to store them for longevity, usability, openness, and FAIRness.

Documentation, Scripts and other Textual Data

  • Recommended: Rich Text Format (.rtf).
  • Recommended: XML (.xml).
  • Recommended: OpenDocument Text (.odt).
  • PDF/A (Adobe) or PDF (.pdf).
  • Hypertext Mark-up Language / HTML (.html).
  • R Markdown files (.rmd) (with HTML version as well).
  • Plain text data (.txt).
  • Longstanding proprietary formats: Microsoft Word (.doc/.docx), Excel (.xls/.xlsx).

Images

  • Preferable: TIFF version 6 uncompressed (.tif).
  • JPEG (.jpeg, .jpg) but only if created in this format. (* otherwise it is considered a "lossy" file format. This means it cuts corners by deleting some of the data deemed to be inessential, thus producing a lower quality, less durable image.)
  • TIFF (other versions) (.tif, .tiff).
  • Adobe Portable Document Format (PDF/A, PDF) (.pdf).
  • Standard applicable RAW image format (.raw).
  • Photoshop files (.psd).
  • BMP (.bmp) but only if created in this format.
  • PNG (.png) but only if created in this format.

Audio

  • Recommended: Free Lossless Audio Codec (FLAC) (.flac).
  • MPEG-1 Audio Layer 3 (.mp3) if original created in this format.
  • Audio Interchange File Format (.aif).
  • Waveform Audio Format (.wav).

Video

  • MPEG-4 (.mp4).
  • OGG video (.ogv, .ogg).
  • motion JPEG 2000 (.mj2).
  • MOV (.mov)
  • Windows Media Video (WMV) (.wmv).
  • WebM (.webm).

 

You may notice that some of the most widely used file formats are not necessarily what we'd recommend as your format of choice.

Ultimately, what file formats you decide to use will depend on the specifics of your research project. XML (.xml) is recommended as an alternative to Microsoft Word (.doc), because Word documents are notoriously difficult for machines to process and interpret (meaning, they are not interoperable). Similarly, in the case of Audio files, FLAC is recommended over the perhaps more common MP3 format, because FLAC is a lossless format, meaning no data is lost when you save it. In the case of images, TIFF is preferable over more common formats such as JPEG for the same reason; TIFF loses less data and is thus more reliable for longterm usability.

You can use multiple formats during a project. For example, data contained in PDF format could be supplemented with content stored in XML.

 

For more recommendations see: 

Recommended formats — UK Data Service

Lossy Compression: Everything You Need to Know | Adobe

File Format Selection - Research Data Management - Research Guides at New York University (nyu.edu)