Building Balanced CT Datasets for Oncology Research
GetDATA Team · · 1 min read
Why balance matters more in oncology
Tumour findings are rare relative to normal anatomy, and lesion size, stage and location are all imbalanced. A CT dataset that mirrors raw clinical prevalence will train a model that under-detects exactly the cases that matter most.
Designing the cohort
- Contrast phase recorded (non-contrast, arterial, venous, delayed) — lesions enhance differently across phases.
- Voxel-level segmentation or RECIST measurements rather than study-level labels alone.
- Scanner and reconstruction diversity (kernel, slice thickness, kVp) so the model survives domain shift.
- Deliberate over-sampling of small and early-stage lesions.
Curation and validation
Document acquisition parameters and label provenance, and validate on a held-out site. Radiomics pipelines in particular are sensitive to reconstruction settings.
Request a targeted CT cohort
On GetDATA you can specify body region, contrast phase, annotation type and minimum case counts, and verified providers fulfil it with compliant, quality-scored CT data.