New open-access COVID-19 dataset supports reproducible clinical research
The Biostatistics Unit at the Germans Trias i Pujol Research Institute (IGTP) has taken a significant step toward open science by publishing the DIVINE study database in the Nature Portfolio journal Scientific Data. This resource provides the scientific community with access to comprehensive clinical information from 5,813 patients who were hospitalized with COVID-19 across five hospitals in the southern metropolitan area of Barcelona between March 2020 and August 2021.
A Collaborative Foundation for Future Research
The database captures a critical period of the pandemic, spanning four distinct waves. It includes detailed records on patient clinical characteristics, identified risk factors, administered treatments and final hospital outcomes. By making this anonymized data available via an R package on CRAN, a GitHub repository, and a Zenodo record, the researchers have ensured that the information is traceable, reproducible, and easily accessible for global study.
This project emerged from a collaborative effort that began during the initial wave of the pandemic. It brought together clinicians, researchers, and biostatisticians from various institutions, including the Bellvitge University Hospital, the Bellvitge Biomedical Research Institute (IDIBELL), the Universitat Politècnica de Catalunya, and several other healthcare centres and academic groups across Catalonia.
The Significance of Open Data
For the scientific community, the publication of the DIVINE cohort represents more than just a data dump; it serves as a vital tool for validating predictive models and understanding the long-term sequelae of the virus. As noted by Cristian Tebé, head of the Biostatistics Unit at IGTP, this move is an ethical commitment to society. By facilitating the reproduction of analyses, the release of this data helps avoid redundant studies and accelerates the expansion of medical knowledge.

Looking Ahead: The Potential for New Discoveries
With the database now publicly available, the scientific community may use the DIVINE cohort to refine existing predictive models regarding patient stratification and clinical outcomes. It is likely that this resource will serve as a foundational teaching tool for students in epidemiology and biostatistics. As researchers continue to analyze this data, we may see a rise in studies examining the long-term health impacts of COVID-19, potentially leading to more targeted treatments and improved hospital management strategies for future public health challenges.

Frequently Asked Questions
What kind of information is included in the DIVINE database?
The database contains clinical information from 5,813 patients hospitalized with COVID-19, including clinical characteristics, risk factors, treatments received, and hospital outcomes collected during hospitalization and follow-up.

How can researchers access this data?
The data is published as an R package on CRAN, with an associated GitHub repository and a Zenodo record to facilitate access and reuse.
Which institutions were involved in the development of this cohort?
The project involved a broad collaboration including IGTP, Bellvitge University Hospital, IDIBELL, Universitat Politècnica de Catalunya, Universitat de Barcelona, Consorci Sanitari Integral, Consorci Sanitari Alt Penedès Garraf, Viladecans Hospital, Parc Sanitari Sant Joan de Déu, and the CIBER in Infectious Diseases.
How do you think the shift toward open-access clinical data will change the pace of medical discovery in the coming years?