Building Balanced CT Datasets for Oncology Research

GetDATA Team · · 1 min read

Why balance matters more in oncology

Tumour findings are rare relative to normal anatomy, and lesion size, stage and location are all imbalanced. A CT dataset that mirrors raw clinical prevalence will train a model that under-detects exactly the cases that matter most.

Designing the cohort

  • Contrast phase recorded (non-contrast, arterial, venous, delayed) — lesions enhance differently across phases.
  • Voxel-level segmentation or RECIST measurements rather than study-level labels alone.
  • Scanner and reconstruction diversity (kernel, slice thickness, kVp) so the model survives domain shift.
  • Deliberate over-sampling of small and early-stage lesions.

Curation and validation

Document acquisition parameters and label provenance, and validate on a held-out site. Radiomics pipelines in particular are sensitive to reconstruction settings.

Request a targeted CT cohort

On GetDATA you can specify body region, contrast phase, annotation type and minimum case counts, and verified providers fulfil it with compliant, quality-scored CT data.

Need a specific medical dataset?

Post a request describing exactly what you need — modality, labels, format and volume — and verified hospitals and labs fulfill it with compliant, de-identified data.