2,000 contrast-enhanced chest CT volumes with lung-nodule 3D segmentation masks (LIDC-IDRI style)

Open

Overview

We are developing a deep-learning pipeline for automated pulmonary nodule detection, characterization, and malignancy risk stratification, and we require a large, well-annotated chest CT dataset to train and validate our models. Specifically, we need volumetric, thin-slice chest CT series acquired with standard clinical protocols (slice thickness 0.625–1.25 mm, reconstructed in both lung window [WL –600 HU, WW 1500 HU] and soft-tissue window [WL 40 HU, WW 400 HU]). Both contrast-enhanced and non-contrast acquisitions are acceptable, though we prefer a mix to improve model generalizability. Each series must be accompanied by radiologist-confirmed 3D segmentation masks delineating all nodules ≥4 mm in longest diameter, including subsolid and ground-glass opacity (GGO) nodules. Annotation should follow LIDC-IDRI conventions: at least two independent radiologist reads per scan with consensus or majority-vote mask, and per-nodule attributes (subtlety, calcification, spiculation, malignancy suspicion on a 1–5 Likert scale). DICOM series must be fully de-identified per HIPAA Safe Harbor and DICOM PS3.15 guidelines, with all burned-in patient text removed from pixel data. NIfTI-converted volumes and JSON sidecar files containing nodule attributes are strongly preferred for ease of ingestion into our training infrastructure. We will also accept raw DICOM with accompanying NIfTI masks. Scans should span a diverse patient population (age, sex, smoking history where feasible) and include cases with benign nodules confirmed by at least 2-year follow-up stability as well as biopsy-confirmed malignant nodules to create a clinically representative label distribution. Primary use cases include training a 3D U-Net nodule segmentor, a false-positive reduction classifier, and a volumetric growth-rate tracker intended for integration into a lung-cancer screening workflow compliant with Lung-RADS 2022. Secondary use cases include RECIST 1.1 longest-diameter measurement automation and multiplanar reconstruction (MPR) visualization research. Tube voltage should be documented in DICOM metadata, with standard protocols at 120 kVp and low-dose acquisitions between 80–100 kVp both acceptable; effective dose should be below 3 mSv for screening-protocol cases. Scanner manufacturer balance across GE, Siemens, Philips, and Canon platforms is requested to reduce scanner-specific bias. Inter-rater agreement metrics (Dice coefficient, Cohen's kappa for malignancy rating) must be reported per contributing site. Scans with motion artifact, severe streak artifact from metallic implants, or incomplete coverage of both lung apices through the posterior costophrenic angles should be excluded by the contributing site's radiologist before submission. Data will be processed on air-gapped GPU clusters and will not be redistributed. An IRB waiver or equivalent ethics approval documentation must accompany each contributing site.

Medical imagingCTChestDICOMJSONNIfTI

Progress

0 / 2000 scans0%

Data Specifications

CategoryMedical imaging
Required quantity2000
Data typesMedical imaging, CT, Chest, DICOM, JSON, NIfTI
BudgetUSD 180000.00
Deadline2026-11-29

Use Cases

  • Training and validating Medical imaging AI/ML models
  • Benchmarking Medical imaging detection and segmentation algorithms
  • Building de-identified Medical imaging research datasets for academic studies
  • Augmenting existing Medical imaging datasets to reduce class imbalance