Pneumothorax Detection Dataset with Pixel-Level Segmentation Masks — 20,000 PA/AP Radiographs
OpenOverview
Our medical imaging company is building a real-time pneumothorax detection and severity-grading tool for use in emergency radiology workflows. We require a curated dataset of chest radiographs containing confirmed pneumothorax cases alongside age- and sex-matched normal controls, with pixel-level segmentation annotations delineating the pleural air collection. Technical specifications: both posteroanterior (PA) and anteroposterior (AP) projections are required, as pneumothorax is frequently captured on portable AP studies in intensive care and emergency settings. Images must be delivered in DICOM format at native acquisition resolution (minimum 1800x1800, 12-bit), with header metadata retained after de-identification. A minimum of 40% of images must be pneumothorax-positive. For each positive case, a binary segmentation mask (PNG, aligned pixel-for-pixel with the source image) must be provided indicating the pneumothorax region. JSON sidecar files must record laterality (left/right), estimated pneumothorax size category (small less than 15%, moderate 15–30%, large greater than 30% lung field), and whether a chest tube or other thoracic support device is visible. Labeling quality requirements: segmentation masks must be drawn by or reviewed by board-certified radiologists or trained radiologic technologists under radiologist supervision. Inter-annotator agreement metrics (Dice coefficient) should be reported if available. Cases with tension pneumothorax or bilateral pneumothorax should be flagged separately, as these represent clinically distinct high-priority findings. Lateral view images, if available for the same encounter, are highly desirable as supplementary data for model generalization. Acquisition and QA criteria: portable AP radiographs acquired at the bedside in ICU or emergency settings are especially valuable because they represent the real-world distribution where pneumothorax detection tools will be deployed. Images from digital radiography (DR) and computed radiography (CR) systems are both acceptable. Acquisition parameters including kVp, mAs, and SID should be retained in DICOM headers post de-identification. Exclusion criteria include severely rotated images (greater than 10-degree tilt), images with significant subcutaneous emphysema obscuring the lung outline, and any radiograph where pixel-level de-identification of burned-in text has altered the lung field region. Post-thoracotomy and post-lobectomy cases may be included but must be flagged, as pleural architecture differs from typical presentations. De-identification and compliance: all DICOM PHI must be removed per the HIPAA Safe Harbor standard or DICOM Basic Application Level Confidentiality Profile. For European contributing sites, GDPR Article 9 special-category health data provisions apply; a data processing agreement (DPA) and institutional review board approval or ethics committee waiver must be confirmed prior to transfer. Any burned-in pixel annotations (patient ID, laterality overlays) must be removed before delivery. This dataset will support both model training and regulatory submission as part of a 510(k) or CE-MDR pathway. Accordingly, provenance documentation — including acquisition site, scanner manufacturer and model, and annotation date — is required per image. A data use agreement will be executed with each contributing institution before data transfer.
Progress
Data Specifications
| Category | Medical imaging |
|---|---|
| Required quantity | 20000 |
| Data types | Medical imaging, X-ray, Chest, DICOM, JSON, PNG / JPG |
| Budget | EUR 120000.00 |
| Deadline | 2027-01-28 |
Use Cases
- Training and validating Medical imaging AI/ML models
- Benchmarking Medical imaging detection and segmentation algorithms
- Building de-identified Medical imaging research datasets for academic studies
- Augmenting existing Medical imaging datasets to reduce class imbalance