display:none
Skip to main content

Join FDA and DiMe during a 2-day virtual event.

CancerX Data Sprint: Harnessing the power of comprehensive real-world datasets for advancements in oncology


Smit Patel

On December 23, 1971, President Richard Nixon signed the National Cancer Act into law, making cancer a reportable disease. In 1989, the Commission on Cancer established the National Cancer Data Base (NCDB), which captures approximately 70% of all new cancer diagnoses in the U.S., and, in 1992, federal legislation further mandated that all cancer cases be reported to state cancer registries. 

While the NCDB stands as a colossal repository, amassing over 1 million cancer case reports from more than 1,430 hospitals annually, the essence of its data often lacks the granularity and real-time relevance needed for cutting-edge, patient-centric oncology care. Today, the data from a single patient’s tumor in a clinical trial can add up to one terabyte — the equivalent of 130,000 books. And 13+ million electronic medical records exist for cancer patients in the U.S. alone. It’s not just about gathering data; it’s about maximizing the clinical utility of this data, establishing uniform standards, and transforming it into actionable insights to revolutionize oncology.

A data-driven oncology revolution

The unprecedented advancements in oncology, fueled by scientific and technological breakthroughs, pose challenges in integrating these developments into clinical practice. There’s an urgent need for immediate, widely applicable, and evidence-based datasets from real-world settings to bridge the gap between clinical research and practical healthcare, providing insights directly applicable to improve oncology care. 

Announced at HLTH, the CancerX Data Sprint demonstration project initiated an 80-day sprint effort with the CancerX member community to develop a comprehensive real-world oncology dataset, focusing on high clinical utility and advanced technical abstraction. It successfully demonstrates the potential of data innovation to produce impactful real-world data (RWD) and comprehensive evidence sets supplementing work on the Center for Medicare and Medicaid Innovation’s (CMMI) Enhancing Oncology Model (EOM), the Office of the National Coordinator for Health Information Technology’s (ONC) United States Core Data for Interoperability (USCDI+) – Oncology extension, and other data initiatives across government agencies and industry sectors. 

150+ members, 80 days sprint, 15 new data elements

The CancerX community engaged in a concerted effort to enhance the quality, accessibility, and effectiveness of comprehensive real-world cancer datasets, aiming to improve cancer data standards and support oncology data initiatives. This three-step approach involved identifying crucial research questions to address the complexities of cancer care, proposing new clinical data elements, facilitating detailed discussions through workshops, and conducting surveys to determine the most clinically useful and technically feasible data elements. The objective is to support pilot implementations, thereby enriching cancer care research and policy with innovative, data-driven solutions.

CancerX Data Sprint Approach Step 1: Identify the high-value research questions that may be asked of the standardized RWD/E generated by participants through EOM Step 2: Supplement the existing data elements planned for collection through the EOM, with the additional data elements necessary to optimize the RWD/E to support these scientific inquiries Step 3: Support piloting the implementation of these optimized data elements in coordination with partners to enhancing the clinical utility of real-world datasets

The research inquiries raised a variety of research questions, highlighting the intricacies of cancer treatment and the need for detailed, yet optimized RWD sets. These questions cover numerous aspects of cancer care, such as treatment sequencing, early progression risk factors, data specificity, the efficacy of receptor-targeted therapies, the impact of biomarker & genomic testing on treatment decisions, the use of imaging data, the relationship between treatment methods and patient outcomes, etc. Emphasizing the need for comprehensive datasets, these inquiries aim to deepen understanding of disease progression, treatment responses, and patient experiences, including clinical significance, data completeness, and the effects of new therapies and testing on healthcare equity and quality of life.

High-quality RWD is built on the rock of meaningful and valid data sets. A data set’s meaningfulness and validity are contingent upon the specific question at hand. RWD could help us better understand the effectiveness and harms of therapies in the real world.

A path to comprehensive real-world data sets

The participants unanimously acknowledged the significant value of a comprehensive dataset in oncology for its capacity to offer a broad and in-depth view of cancer care, surpassing the limitations of traditional. This dataset is seen as highly beneficial for influencing clinical practices, advancing research, guiding treatment decisions, and improving patient care pathways, thus playing a crucial role in the evolution of oncology. 

For example, breast cancer research can benefit from this rich data set to deepen our understanding of disease progression and treatment outcomes. Analyzing specific elements such as the ICD-10-CM Diagnosis Code, Initial Date of Diagnosis, and Recurrence or Relapse Clinical Status, in conjunction with treatment and staging information, allows researchers to identify patterns in disease recurrence and evaluate the efficacy of different treatment approaches.

The projected frequency of such dataset usage by participants is expected to range from daily to a few times per year, depending on the institution, type of research priorities, and the quantity and quality of the data including the use in clinical trial identification and clinical decision-making. This feedback indicated strong interest in the utilization of a dataset that is both extensive and adaptable, while capable of supporting a wide range of research inquiries that can ultimately advance patient care and research capabilities in oncology. 

The clinical data elements deemed clinically useful and technically feasible to abstract are available here. Findings also emphasized a critical need to expand and improve the oncology dataset’s scope, rigor, and quality, with a focus on healthcare equity, accessibility of treatments, financial implications of care, and the inclusion of patient-reported outcomes and wider health considerations. The focus on collecting more in-depth data about diagnosis, treatment types, outcomes, and disease progression, aims to enhance understanding of treatment efficacy and to tailor patient care more precisely.

Envisioning a new era: Shaping the future of oncology through data-driven innovation

The findings from the CancerX Data Sprint have been shared with government agencies for a data-focused roundtable discussion hosted at The White House. These insights provide the foundation for informing a pilot effort announced at the 2023 ONC Annual Meeting, which aims to identify and disseminate the technologies and best practices that will fast-track data-driven breakthroughs in cancer research, care, and policy. Sign up to receive updates on this work.

By focusing on enhancing oncology care coordination, improving care quality, and promoting interoperable data elements, this effort aligns with a shared goal of healthcare data excellence. It not only promises significant progress but also the potential for novel cancer breakthroughs, while advancing the ambitious goals of the Cancer Moonshot. 

Are you ready to leverage digital innovation in the fight against cancer? Join CancerX today and be a part of this transformative journey!

Join our next project

Help streamline the path to regulatory and commercial success to optimize health outcomes for the greatest number of patients

Join the Integrated Evidence Plans project

Join us
Not today