Data curation ensures that datasets are complete, well-described, and in a format and structure that best facilitates long-term access, discovery, and reuse.
Funders are increasingly emphasizing the importance of curation and quality assurance when choosing a repository, and included "expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata" in the 2022 Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research report.
Data curators collaborate with researchers to make data more Findable, Accessible, Interoperable and Reusable by aligning with the FAIR Principles.
The curation process involves a review of a researcher’s data and documentation to ensure the data are as complete, understandable, and accessible as possible. These reviews do not judge the core scientific analysis, methodologies, or conclusions behind the data. Instead, the purpose of review is to ensure metadata completeness, and data usability and discoverability. The checklist below is useful for data curation, whether reviewing your own work or that of your colleagues or students.
Check files/code and read documentation (risk mitigation, file inventory, appraisal/selection)
Understand the data (or try to), if not… (run files/code, QA/QC issues, review readme or other metadata)
Request missing information or changes (tracking provenance of any changes and why)
Augment metadata for findability (DOIs, metadata standards, discoverability)
Transform file formats for reuse (recommend file formats for longer term reuse and preservation)
Evaluate for FAIRness (usage licenses, links to related research, accessibility)
Document all curation activities throughout the process