Back

Research Data Management

Updated on

Why data format matters?

Digital formats, while offering overall valuable capabilities for data storage and manipulation, face inherent vulnerabilities that threaten long-term accessibility and usability. Format obsolescence, the process by which digital file formats become unusable due to technological advances and lack of support, is one of the most significant risks to digital preservation. This obsolescence can occur through various mechanisms: software updates that drop support for older formats, hardware that can no longer read certain storage media, or the disappearance of technical documentation needed to interpret the data. The challenges extend beyond mere technical considerations, as evidenced by historical cases. A famous example is the Viking Mars mission data from the 1970s, where some data became almost inaccessible due to obsolete storage formats and hardware (Layne et al., 2012). This incident highlighted how cutting-edge technology of one era can become a preservation liability in another. Research preservation faces significant obstacles to maintain access to decades of experimental data. Proprietary formats often depend on specific software versions or platforms, creating vendor lock-in and preservation risks. Moreover, the increasing volume and complexity of research data demands formats that can efficiently handle large-scale datasets while maintaining their integrity and preserving essential metadata. These challenges underscore the importance of strategic format selection in research data management.

The selection of appropriate data formats thus plays a crucial role in ensuring the long-term preservation and accessibility of research data. While specific requirements may vary by discipline and institution, adherence to these guidelines promotes interoperability and ensures long-term access to research outputs. This document outlines key considerations and recommendations for format selection in research data preservation.

Main Principles

When selecting data formats for long-term preservation, the scientific team (including researchers, data managers and development specialists) should prioritise formats that maximise data reusability and long-term accessibility. Key attributes for success include:

1. Openness and Documentation

2. Format Characteristics

3. Technical Sustainability

Recommended formats by data type

The data format needs to take into account different types of data, as present below. For a comprehensive overview of technical considerations and best practices, including file formats and standards, refer to the Digital Preservation Handbook from the Digital Preservation Coalition.

**Textual Data

**Tabular Data

**Images

**Audio

**Video