The opportunities and shortcomings of using big data and national databases for sarcoma research

Heather G. Lyu, Adil H. Haider, Adam B. Landman, Chandrajit P. Raut

Research output: Contribution to journalReview articlepeer-review

30 Citations (Scopus)

Abstract

The rarity and heterogeneity of sarcomas make performing appropriately powered studies challenging and magnify the significance of large databases in sarcoma research. Established large tumor registries and population-based databases have become increasingly relevant for answering clinical questions regarding sarcoma incidence, treatment patterns, and outcomes. However, the validity of large databases has been questioned and scrutinized because of the inaccuracy and wide variability of coding practices and the absence of clinically relevant variables. In addition, the utilization of large databases for the study of rare cancers such as sarcoma may be particularly challenging because of the known limitations of administrative data and poor overall data quality. Currently, there are several large national cancer databases, including the Surveillance, Epidemiology, and End Results database, the National Cancer Data Base of the American College of Surgeons and the American Cancer Society, and the National Program of Cancer Registries of the Centers for Disease Control and Prevention. These databases are often used for sarcoma research, but they are limited by their dependence on administrative or billing data, the lack of agreement between chart abstractors on diagnosis codes, and the use of preexisting documented hospital diagnosis codes for tumor registries, which lead to a significant underestimation of sarcomas in large data sets. Current and future initiatives to improve databases and big data applications for sarcoma research include increasing the utilization of sarcoma-specific registries and encouraging national initiatives to expand on real-world, evidence-based data sets.

Original languageEnglish
Pages (from-to)2926-2934
Number of pages9
JournalCancer
Volume125
Issue number17
DOIs
Publication statusPublished - Sept 2019
Externally publishedYes

Keywords

  • Epidemiology
  • National Cancer Data Base (NCDB)
  • Surveillance
  • and End Results (SEER)
  • big data
  • database
  • sarcoma

Fingerprint

Dive into the research topics of 'The opportunities and shortcomings of using big data and national databases for sarcoma research'. Together they form a unique fingerprint.

Cite this