5,000 chest CT scans with COVID-19 and viral pneumonia ground-glass opacity segmentation for AI triage research
OpenOverview
This request seeks a large, diverse, multi-site chest CT dataset to support research into AI-assisted diagnosis and severity scoring of COVID-19 pneumonia and other viral lower respiratory tract infections. The hallmark finding of interest is ground-glass opacity (GGO), typically manifesting as hazy areas of increased attenuation that do not obscure underlying bronchovascular structures, with Hounsfield unit range approximately –600 to –200 HU in infected regions compared to normal lung parenchyma near –850 HU. Consolidation, crazy-paving pattern, and subpleural distribution are additional features to be captured. Imaging requirements: axial thin-section CT (slice thickness 0.625–1.5 mm), lung window reconstruction (WL –600 HU, WW 1500 HU), both non-contrast and low-dose protocols accepted. Each volume must carry at least one of the following annotation tiers: (a) lobe-level GGO and consolidation segmentation masks in NIfTI format, (b) whole-lung segmentation mask, (c) per-scan severity score (CT severity index 0–25 or equivalent), or (d) RT-PCR confirmed diagnosis label (COVID-19 positive, influenza, other viral, bacterial, non-infectious). We strongly prefer scans with all four annotation tiers but will accept partial annotation with appropriate metadata flags. De-identification must comply with GDPR Recital 26 for European institutions and HIPAA Safe Harbor for US contributors. Longitudinal series from the same patient (admission, day-5, discharge) are highly valuable and should be pseudonymized with a consistent patient key so temporal progression can be modeled. JSON sidecars should include acquisition date relative to symptom onset, vaccination status if available, ICU admission flag, and oxygen saturation at time of scan. Volumetric DICOM series delivered as complete studies with all reconstructed series (lung kernel, soft-tissue kernel) are preferred; NIfTI-converted volumes are also acceptable. Tube voltage is typically 100–120 kVp for standard chest CT; low-dose screening protocols at 80 kVp are acceptable provided noise characteristics are documented. Scanner diversity is essential: contributions from GE, Siemens, Philips, and Canon sites are all welcome, and geographic diversity spanning Europe, North America, and Asia is prioritized to capture population-level variation in disease presentation. Annotation inter-rater agreement for GGO percentage (intraclass correlation coefficient ≥ 0.85) must be reported. QA exclusion criteria include scans with greater than 20% motion-corrupted slices, incomplete lung coverage, or absence of confirmed microbiological diagnosis. The resulting model will support real-time triage scoring integrated into PACS worklist systems, enabling prioritization of deteriorating patients in high-volume pandemic or endemic disease scenarios. Data will not be used for any commercial purpose beyond the stated AI research scope, and results will be published with appropriate attribution to contributing institutions.
Progress
Data Specifications
| Category | Medical imaging |
|---|---|
| Required quantity | 5000 |
| Data types | Medical imaging, CT, Chest, DICOM, JSON, NIfTI |
| Budget | USD 320000.00 |
| Deadline | 2027-06-02 |
Use Cases
- Training and validating Medical imaging AI/ML models
- Benchmarking Medical imaging detection and segmentation algorithms
- Building de-identified Medical imaging research datasets for academic studies
- Augmenting existing Medical imaging datasets to reduce class imbalance