Data Sharing

ix
Image credit: ImageFlow | stock.adobe.com

Summary

Data sharing in science refers to the practice of making research data (measurements, observations, etc.) and its associated metadata (information about the data) publicly available for others to access and use. This practice promotes transparency, accelerates scientific discovery, and fosters collaboration by allowing researchers to build upon each other’s work.

OnAir Post: Data Sharing

About

Source: Gemini AI Overview

Benefits of Data Sharing

  • Accelerated Discovery
    Sharing data enables researchers to combine datasets, perform meta-analyses, and explore research questions from different perspectives, leading to faster scientific breakthroughs.

  • Increased Reproducibility
    Data sharing allows other researchers to verify and validate published findings, enhancing the reliability and trustworthiness of scientific results.

  • Reduced Redundancy
    Sharing data prevents duplication of research efforts and allows researchers to focus on new and innovative research questions.

  • Collaboration and Knowledge Sharing
    Data sharing fosters collaboration among researchers, facilitating the exchange of knowledge and expertise.

  • Training and Education
    Shared data can be used for educational purposes, training future scientists and researchers.

Methods of Data Sharing

  • Public Repositories
    Platforms like Zenodo, Figshare, and institutional repositories provide secure and long-term storage for research data. 

  • Supplemental Materials
    Data can be shared as supplemental material accompanying published articles. 

  • Data Papers
    Publications dedicated to describing and sharing a specific dataset. 

  • Centralized and Federated Approaches
    Different approaches exist for sharing data, including central repositories and federated networks where data is stored locally but accessible through a central hub. 

Challenges

Sharing research data is crucial for fostering collaboration, verifying findings, and accelerating scientific advancements. However, several interconnected barriers hinder effective data sharing in science.

Initial Source for content: Gemini AI Overview 7/21/2025

[Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on the key issues and challenges related to this post in the “Comment” section below.  Post curators will review your comments & content and decide where and how to include it in this section.]

1. Lack of incentives

  • Limited recognition
    The current academic reward system primarily emphasizes publications, not data sharing, potentially leading researchers to prioritize publication over making data readily available and reusable.

  • Fear of being scooped
    Researchers may worry that sharing their data might allow others to publish on it before they have fully capitalized on their own work, notes the Center for Data Innovation.

  • Time and resource constraints
    Preparing data for sharing, documenting it properly, and navigating repositories and sharing platforms takes time and effort, for which researchers may not be adequately compensated.

2. Technical complexities and interoperability

  • Lack of standardization
    Inconsistent formats, vocabularies, and metadata descriptions make it difficult to integrate and combine data from different studies and sources.

  • Data quality issues
    Problems like incomplete, inaccurate, or inconsistent data can hinder reusability and introduce errors in downstream analyses, according to FirstEigen.

  • Technical expertise and infrastructure
    Researchers and institutions may lack the necessary technical expertise, tools, or infrastructure to manage, store, and share data effectively.

3. Ethical and legal concerns

  • Privacy and confidentiality
    Safeguarding sensitive participant data, especially in fields like healthcare, requires robust de-identification techniques and careful consideration of re-identification risks, says Falcon Scientific Editing.

  • Informed consent
    Obtaining truly informed consent for future data sharing, particularly for diverse and potentially unknown future uses, can be challenging.

  • Data ownership and intellectual property
    Ambiguity regarding data ownership and intellectual property rights can create friction and hesitation in sharing data.

4. Cultural and social barriers

  • Lack of trust
    Concerns about data misuse, misinterpretation by others, or even perceived “adversarial science” can erode trust and discourage data sharing.

  • Organizational and cultural silos
    Data sharing can be hindered by internal barriers within institutions and by a lack of a collaborative culture that values data sharing, according to Atlan.

  • Difficulties in navigating repositories and platforms
    Finding the appropriate repository, understanding licensing agreements, and navigating data access processes can be daunting, especially for early career researchers.
     

Innovations

Addressing data sharing challenges in science involves a multifaceted approach, including improved data management practices, technological advancements, and policy changes. Key innovations focus on enhancing data discoverability, ensuring data quality, and fostering a culture of data sharing within the research community.

Initial Source for content: Gemini AI Overview  7/21/25

[Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on innovative research related to this post in the “Comment” section below.  Post curators will review your comments & content and decide where and how to include it in this section.]

1. Enhanced Data Discovery and Accessibility

  • Data Portals and Clearinghouses
    Creating centralized platforms where researchers can find, access, and reuse data from various sources. 

  • Metadata Standards
    Developing and implementing standardized ways to describe and catalog data, making it easier to find and understand. 

  • Persistent Identifiers
    Assigning unique, persistent identifiers to datasets, ensuring they can be reliably cited and accessed over time. 

2. Ensuring Data Quality and Interoperability

  • Data Validation and Cleaning
    Implementing tools and processes to identify and correct errors in datasets, ensuring data quality. 

  • Standardized Data Formats
    Promoting the use of common data formats and structures to facilitate data integration and analysis across different studies and systems. 

  • Data Harmonization
    Developing methods to combine and compare data from different sources that may use different formats or scales. 

3. Fostering a Culture of Data Sharing

  • Incentives for Data Sharing
    Designing funding policies, journal requirements, and career advancement structures that reward researchers for sharing their data. 

  • Open Science Practices
    Promoting the principles of open science, which emphasize transparency, reproducibility, and data sharing. 
  • Collaborative Data Platforms
    Creating online spaces where researchers can collaborate on data analysis and interpretation. 

4. Addressing Ethical and Legal Considerations

  • Informed Consent
    Ensuring that research participants are fully informed about how their data will be used and shared. 

  • Data Security and Privacy
    Implementing robust security measures to protect sensitive data and comply with privacy regulations. 

  • Indigenous Data Sovereignty
    Developing approaches that respect the rights and cultural practices of Indigenous communities regarding their data. 

5. Technological Innovations

  • Federated Data Systems
    Allowing researchers to analyze data across multiple institutions without physically moving the data. 

  • Blockchain Technology
    Exploring the use of blockchain for secure and transparent data sharing and provenance tracking. 

  • Artificial Intelligence (AI) and Machine Learning
    Utilizing AI to automate data processing, analysis, and discovery. 

 

Projects

The scientific community is actively engaged in developing innovative solutions to the challenges of data sharing. These efforts encompass the development of new platforms and technologies, the adoption of best practices, and the fostering of a more open and collaborative scientific culture. The goal is to maximize the value of research data, accelerate scientific discovery, and ensure that research is reproducible, robust, and impactful.

Initial Source for content: Gemini AI Overview  7/21/25

[Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on current and future projects implementing solutions to this post challenges in the “Comment” section below.  Post curators will review your comments & content and decide where and how to include it in this section.]

1. FAIR data principles

  • Focus
    A set of guiding principles (Findable, Accessible, Interoperable, and Reusable) designed to maximize the discoverability and reusability of research data for both humans and computers.

  • Key aspects
    • Findable
      Assigning globally unique and persistent identifiers to data and metadata, describing data with rich metadata, and registering data in searchable resources.

    • Accessible
      Ensuring data and metadata are retrievable using standardized protocols, with mechanisms for authentication and authorization where necessary, and making metadata accessible even when data is no longer available.

    • Interoperable
      Using formal, accessible, shared, and broadly applicable languages and vocabularies for knowledge representation, and ensuring metadata includes qualified references to other data.

    • Reusable
      Richly describing data with accurate and relevant attributes, including clear and accessible data usage licenses, provenance information, and adhering to domain-relevant community standards.

  • Significance
    Enables faster time-to-insight, improves data return on investment (ROI), supports AI and multi-modal analytics, ensures reproducibility and traceability, and fosters team collaboration.

  • Challenges
    Fragmented data systems and formats, lack of standardized metadata or ontologies, high cost and time investment to transform legacy data, cultural resistance, and infrastructure not built for multi-modal data.
     

2. Open Science and Open Data Initiatives

  • Focus
    Promoting transparency, accessibility, and collaboration in research through open sharing of data, methodologies, and findings.

  • Key projects
    • NASA Open Science Data Initiatives
      Maximizing scientific results from NASA-funded research data and encouraging community participation by aligning with FAIR principles.

    • NIH Data Management & Sharing Policy (DMSP)
      Mandating data sharing plans for all NIH-funded research, empowering investigators to choose appropriate sharing methods, and encouraging the use of established repositories.

    • Open Science Framework (OSF)
      An open-source software project supporting open collaboration and data sharing in scientific research, acting as a generalist repository and managing research projects.

    • Open Access Journals and Preprint Repositories
      Increasing discoverability, reach, and impact of research by making publications and data freely available.

  • Significance
    Accelerates scientific discovery, enhances reproducibility, fosters collaboration, increases research visibility and impact, and facilitates innovation and knowledge transfer.

3. Data sharing platforms and technologies

  • Focus
    Developing platforms and tools to facilitate data sharing, archiving, and analysis.

  • Key examples
    • General-purpose repositories
      Zenodo, Figshare, Dryad, Harvard Dataverse, Mendeley Data, UCLA Dataverse.

    • Domain-specific repositories
      GEO, cBioPortal, PANGAEA, PubChem, GenBank, specific repositories for clinical trials like Vivli.

    • Open-source tools
      gobbli, SMART, Harness-Vue for large dataset management.

    • Data sharing platforms
      Synapse and AD Knowledge Portal (Alzheimer’s Disease) providing controlled access to diverse biomedical data.

    • Innovative technologies
      • Privacy-Enhancing Technologies (PETs)
        Protecting sensitive data by enabling privacy-preserving data analysis and sharing.

      • Blockchain technology
        Potentially enhancing data transparency, traceability, and security in data sharing.

      • Cloud computing
        Facilitating easier and more scalable data sharing and management.

      • AI and Machine Learning
        Supporting automated data preprocessing, advanced predictive modeling, natural language processing for text analysis, automated feature engineering, and improved data visualization.

      • Persistent Identifiers (PIDs)
        Ensuring long-term traceability and citability for datasets and researchers (e.g., DOI, ORCID iDs).

4. Addressing challenges and promoting best practices

  • Focus
    Overcoming technical, ethical, and sociological barriers to effective data sharing.

  • Key strategies
    • Developing funding models and incentives
      Addressing financial challenges associated with data management and ensuring researchers are rewarded for sharing their data.

    • Improving data quality and standardization
      Establishing rigorous data collection and reporting standards, developing common data models and elements, and using AI-based tools for metadata translation.

    • Strengthening legal and ethical frameworks
      Implementing advanced anonymization techniques, using dynamic consent methods, and ensuring clear communication of data-sharing rules and compliance with privacy regulations (e.g., HIPAA).

    • Promoting training and education
      Providing researchers with the necessary skills and knowledge for effective data management, curation, and sharing.

    • Cultivating a culture of openness and collaboration
      Encouraging researchers to embrace open science principles and fostering interdisciplinary and international collaborations.
       

Discuss

OnAir membership is required. The lead Moderator for the discussions is onAir Curators. We encourage civil, honest, and safe discourse. For more information on commenting and giving feedback, see our Comment Guidelines.

This is an open discussion on the contents of this post.

Home Forums Open Discussion

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.

Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on the key issues and challenge.  Post curators will review your comments & content and decide where and how to integrate it into the “Challenge” Section.

Home Forums Challenges

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.

Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on innovative research.  Post curators will review your comments & content and decide where and how to include it in this section.

Home Forums Innovations

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.

Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on current and future projects implementing solutions. Post curators will review your comments & content and decide where and how to include it in this section.

Home Forums Projects

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.
Skip to toolbar