Committee on Data of the International Science Council
nonprofitParis, France
Research output, citation impact, and the most-cited recent papers from Committee on Data of the International Science Council. Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Committee on Data of the International Science Council
International audience
The COVID-19 pandemic has spurred the use of AI and DS innovations in data collection and aggregation. Extensive data on many aspects of the COVID-19 has been collected and used to optimize public health response to the pandemic and to manage the recovery of patients in Sub-Saharan Africa. However, there is no standard mechanism for collecting, documenting and disseminating COVID-19 related data or metadata, which makes the use and reuse a challenge. INSPIRE utilizes the Observational Medical Outcomes Partnership (OMOP) as the Common Data Model (CDM) implemented in the cloud as a Platform as a Service (PaaS) for COVID-19 data. The INSPIRE PaaS for COVID-19 data leverages the cloud gateway for both individual research organizations and for data networks. Individual research institutions may choose to use the PaaS to access the FAIR data management, data analysis and data sharing capabilities which come with the OMOP CDM. Network data hubs may be interested in harmonizing data across localities using the CDM conditioned by the data ownership and data sharing agreements available under OMOP's federated model. The INSPIRE platform for evaluation of COVID-19 Harmonized data (PEACH) harmonizes data from Kenya and Malawi. Data sharing platforms must remain trusted digital spaces that protect human rights and foster citizens' participation is vital in an era where information overload from the internet exists. The channel for sharing data between localities is included in the PaaS and is based on data sharing agreements provided by the data producer. This allows the data producers to retain control over how their data are used, which can be further protected through the use of the federated CDM. Federated regional OMOP-CDM are based on the PaaS instances and analysis workbenches in INSPIRE-PEACH with harmonized analysis powered by the AI technologies in OMOP. These AI technologies can be used to discover and evaluate pathways that COVID-19 cohorts take through public health interventions and treatments. By using both the data mapping and terminology mapping, we construct ETLs that populate the data and/or metadata elements of the CDM, making the hub both a central model and a distributed model.
Introduction: Population health data integration remains a critical challenge in low- and middle-income countries (LMIC), hindering the generation of actionable insights to inform policy and decision-making. This paper proposes a pan-African, Findable, Accessible, Interoperable, and Reusable (FAIR) research architecture and infrastructure named the INSPIRE datahub. This cloud-based Platform-as-a-Service (PaaS) and on-premises setup aims to enhance the discovery, integration, and analysis of clinical, population-based surveys, and other health data sources. Methods: The INSPIRE datahub, part of the Implementation Network for Sharing Population Information from Research Entities (INSPIRE), employs the Observational Health Data Sciences and Informatics (OHDSI) open-source stack of tools and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to harmonise data from African longitudinal population studies. Operating on Microsoft Azure and Amazon Web Services cloud platforms, and on on-premises servers, the architecture offers adaptability and scalability for other cloud providers and technology infrastructure. The OHDSI-based tools enable a comprehensive suite of services for data pipeline development, profiling, mapping, extraction, transformation, loading, documentation, anonymization, and analysis. Results: The INSPIRE datahub's "On-ramp" services facilitate the integration of data and metadata from diverse sources into the OMOP CDM. The datahub supports the implementation of OMOP CDM across data producers, harmonizing source data semantically with standard vocabularies and structurally conforming to OMOP table structures. Leveraging OHDSI tools, the datahub performs quality assessment and analysis of the transformed data. It ensures FAIR data by establishing metadata flows, capturing provenance throughout the ETL processes, and providing accessible metadata for potential users. The ETL provenance is documented in a machine- and human-readable Implementation Guide (IG), enhancing transparency and usability. Conclusion: The pan-African INSPIRE datahub presents a scalable and systematic solution for integrating health data in LMICs. By adhering to FAIR principles and leveraging established standards like OMOP CDM, this architecture addresses the current gap in generating evidence to support policy and decision-making for improving the well-being of LMIC populations. The federated research network provisions allow data producers to maintain control over their data, fostering collaboration while respecting data privacy and security concerns. A use-case demonstrated the pipeline using OHDSI and other open-source tools.
Background: Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets. Methods: The "INSPIRE" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves. Results: Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research. Conclusion: The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.
The Global Open Science Cloud has the potential to advance the way scientific data and resources are shared and accessed, and how global collaboration happens. However, addressing the challenges associated with its creation and ensuring inclusivity, interoperability, data privacy, and sustainability are crucial for its success. The collaborative efforts of stakeholders from different disciplines, regions, and sectors will be essential in realising the vision of a truly global and open science platform. The achievements of GOSC so far, including successful collaborations, funded projects, and the development of a common reference framework, demonstrate its potential and progress towards its goals.
Pour citer ce document:Barzman M. (Coord.), Gerphagnon M. (Coord.), Mora O. (Coord.),Aubin-Houzelstein G., Bénard A., Martin C., Baron G.L, Bouchet F., Dibie-Barthélémy J., Gibrat J.F., Hodson S., Lhoste E., Moulier-Boutang Y., Perrot S., Phung F., Pichot C., Siné M., Venin T. 2019. Transition numérique et pratiques de recherche et d’enseignement supérieur en agronomie, environnement, alimentation et sciences vétérinaires à l’horizon 2040.INRA, France, 161 pages
<em>Français:</em> Le deuxième webinaire de la série de webinaires 2022 de l’Alliance CODATA-DDI aura lieu le lundi 13 juin 2022 sur le thème Vive les Métadonnées ! (les bases du DDI en français). Cet atelier a présenté les normes et les produits de l’Alliance DDI pour la collecte, la gestion et la diffusion des données, comme un excellent moyen de répondre aux demandes de données FAIR dans la recherche moderne. DDI fournit une approche fine et indépendante de la plate-forme des métadonnées et de la documentation des données pour les données qui soutiennent la recherche sociale, comportementale et économique, ainsi que la santé publique et les statistiques officielles. De plus en plus, il cherche à optimiser les métadonnées non seulement dans ces domaines, mais aussi pour des ensembles de données intégrés tirés de tout le spectre des données scientifiques. Les normes et les modèles sont à la fois techniques et conceptuels. Cet atelier a donné un aperçu des normes et des produits, ainsi qu’une introduction à leur utilisation. Bien que DDI soit utilisé à l’échelle internationale, il s’agissait du premier atelier organisé par l’Alliance DDI en langue française. Parmi les intervenants figuraient Alina Danciu (Sciences Po, Centre de Données Socio-Politiques – CDSP), Christophe Dzikowski (Institut national de la statistique et des études économiques – INSEE), Simon Hodson (CODATA), Hilde Orten (SIKT) et Nicolas Sauger (Sciences Po, CDSP). Les présentations sont ici et un enregistrement de l’atelier seront bientôt disponibles ici. ============================================ <em>English:</em> The second webinar in the 2022 CODATA-DDI Alliance webinar series will took place on Mon 13 June 2022 on the topic, Vive les Métadonnées! (DDI basics in French). This workshop introduced the standards and products of the DDI Alliance for data collection, management, and dissemination, as an excellent way of meeting the demands for FAIR data in modern research. DDI provides a platform-independent, fine-grained approach to metadata and data documentation for data which supports social, behavioural, and economic research, as well as public health and official statistics. Increasingly, it is looking to optimize metadata not only within these domains, but also for integrated data sets drawn from across the scientific data spectrum. The standards and models are at both technical and conceptual levels. This workshop gave an overview of the standards and products, and an introduction to their use. Although DDI is used internationally, this was the first workshop conducted by the DDI Alliance in the French language. Presenters included Alina Danciu (Sciences Po, Centre de Données Socio-Politiques – CDSP), Christophe Dzikowski (Institut national de la statistique et des études économiques – INSEE), Simon Hodson (CODATA), Hilde Orten (SIKT) and Nicolas Sauger (Sciences Po, CDSP).
International audience
The first webinar in the 2022 CODATA-DDI Alliance webinar series took place on 28 March 2022, examining how DDI products work together with other standards. Standards promote interoperability, transparency across agencies, and reduce development time. Systems used for the collection, management, production, and dissemination of data regularly employ several different standards for modelling and transmitting their data and metadata. DDI is often a core part of this suite, but it is rarely the only one. The way in which DDI aligns and interacts with other standards has evolved over time: from a static, archival ‘XML codebook’, it has become a metadata resource for helping to drive systems and provide the basis for reuse and data harmonisation. Looking forward, DDI promises to support FAIR data in significant ways, as part of an integrated suite of standards. This discussion looked at the different DDI metadata standards and how they relate to both each other and to other models and standards in common use in applications today, including schema.org, DCAT, and Dublin Core. Speakers included Alina Danciu (Sciences Po), Christophe Dzikowski (INSEE) and Arofan Gregory (CODATA). Questions and answers were moderated by Adrian Dusa (University of Bucharest).
The second webinar of the new DDI Alliance / CODATA Training Webinar series 2021 took place on 18 June 2021,<strong> </strong>with 75 participants. <strong>Implementing FAIR: What DDI Can Do for You!</strong><em> </em>featured speakers Simon Hodson (CODATA) and Arofan Gregory (CODATA/DDI Alliance), with Alina Danciu (Sciences Po) hosting a sustained Q&A session. Presentations covered an overview of the FAIR principles, an overview of the DDI suite of tools, and how these can help institutions and infrastructures in making and keeping research data FAIR. This webinar and discussion was aimed at audiences both inside and external to the DDI community. It highlighted where metadata is needed in terms of the FAIR principles and explains how different DDI products can help to support the implementation of FAIR. <strong>About DDI</strong> DDI Alliance: https://ddialliance.org/ Current products: https://ddialliance.org/products/overview-of-current-products <strong>About DDI-CDI</strong> Introduction: https://ddi-alliance.atlassian.net/wiki/download/attachments/860815393/Part_1_DDI-CDI_Intro_PR_1.pdf Public review page: https://ddi-alliance.atlassian.net/wiki/x/IQBPMw Complete download package: https://ddi-alliance.bitbucket.io/DDI-CDI/DDI-CDI_Public_Review_1.zip Announcement at DDI Alliance website: https://ddialliance.org/announcement/public-review-ddi-cross-domain-integration-ddi-cdi
Data quality is a topic which is often discussed and attracts a lot of interest, but the way in which it is approached varies widely. This, our fourth webinar of the 2021 series took place on 18 November 2021 and examined different approaches in the use of metadata for describing data quality from the perspective of data producers in official statistics and in the scientific and research domains, and how DDI fits into this picture. Some data providers will attempt to assert their data quality as a function of certification for the services they provide, and the richness of their metadata. In the official statistics world, additional descriptive metadata is provided according to agreed quality frameworks. DDI supports both approaches, and the details of each are explored in this webinar. Presenters were: Arofan Gregory, Chair, CDI Working Group, DDI Alliance / CODATA Kaia Kulla, Statistics Estonia The session was introduced by Laura Molloy, CODATA; and the question and answer session was moderated by Alina Danciu, Sciences Po. Data quality is a topic which is often discussed and attracts a lot of interest, but the way in which it is approached varies widely. This, our fourth webinar of the 2021 series took place on 18 November and examined different approaches in the use of metadata for describing data quality from the perspective of data producers in official statistics and in the scientific and research domains, and how DDI fits into this picture. Some data providers will attempt to assert their data quality as a function of certification for the services they provide, and the richness of their metadata. In the official statistics world, additional descriptive metadata is provided according to agreed quality frameworks. DDI supports both approaches, and the details of each are explored in this webinar. Presenters were: Arofan Gregory, Chair, CDI Working Group, DDI Alliance / CODATA Kaia Kulla, Statistics Estonia The session was introduced by Laura Molloy, CODATA; and the question and answer session was moderated by Alina Danciu, Sciences Po. Data Privacy Statement for participants. About DDI DDI Alliance: https://ddialliance.org/ Current products: https://ddialliance.org/products/overview-of-current-products About DDI-CDI Introduction: https://ddi-alliance.atlassian.net/wiki/download/attachments/860815393/Part_1_DDI-CDI_Intro_PR_1.pdf Public review page: https://ddi-alliance.atlassian.net/wiki/x/IQBPMw Complete download package: https://ddi-alliance.bitbucket.io/DDI-CDI/DDI-CDI_Public_Review_1.zip Announcement at DDI Alliance website: https://ddialliance.org/announcement/public-review-ddi-cross-domain-integration-ddi-cdi