Cardiomegaly and Cardiothoracic Ratio Measurement Dataset — 30,000 Annotated PA Chest Radiographs
OpenOverview
Our cardiovascular research consortium is developing an automated cardiac silhouette measurement tool to compute the cardiothoracic ratio (CTR) from posteroanterior (PA) chest radiographs, enabling scalable screening for cardiomegaly in primary care and resource-limited settings. We require a large cohort of PA radiographs with expert cardiac and thoracic dimension annotations. Technical specifications: only posteroanterior (PA) projections are acceptable for this request, as CTR measurement is standardized for PA acquisition geometry. AP views introduce systematic magnification error due to the increased source-to-image distance and heart-to-detector gap, and are excluded from this dataset. Images must be provided in DICOM format at full acquisition resolution (minimum 2048x2048, 12-bit), with preserved DICOM header data after de-identification. JSON annotations per image must include: maximum transverse cardiac diameter in pixels and millimeters, maximum transverse thoracic diameter measured at the right hemidiaphragm level, computed CTR value, and image-level cardiomegaly label (CTR greater than 0.5 equals cardiomegaly, per established radiological convention). Cases with pericardial effusion, marked scoliosis, or significant rotation artifacts should be flagged as these conditions affect CTR reliability and require exclusion from the primary training set. Additional clinical label requirements: datasets should record co-existing findings relevant to cardiac pathology, including pulmonary vascular congestion, pleural effusion, and interstitial edema. Patient demographic metadata (age decade, biological sex) and known clinical diagnosis (heart failure, dilated cardiomyopathy, hypertensive heart disease, valvular disease) should be included where available, encoded in a manner that does not constitute a re-identification risk. A balanced case mix is expected: approximately 30% cardiomegaly-positive defined by CTR greater than 0.5, and 70% normal or borderline. Acquisition and QA standards: images must demonstrate adequate inspiration — at least 8–9 posterior ribs visible above the right hemidiaphragm — as inadequate inspiration artificially increases the apparent cardiac silhouette and inflates CTR. Images must be acquired at 100–125 kVp with standard posteroanterior positioning at 1.8–2.0 meters SID. Images showing rotational artifact, where the spinous processes are not equidistant from the medial clavicular heads, should be flagged or excluded. Representation from at least four distinct DR or CR scanner platforms (e.g., Philips DigitalDiagnost, Siemens Ysio, GE Definium, Canon CXDI series) is requested to ensure vendor generalizability of the trained model. Radiographs acquired over at least a five-year span are desirable to capture acquisition protocol evolution. De-identification and compliance: all 18 HIPAA-defined PHI categories must be removed from DICOM headers per the Safe Harbor method. Burned-in pixel data annotations such as patient name banners, institutional logos, or laterality markers embedded during acquisition must be confirmed absent or removed prior to delivery. For European contributing institutions, a data processing agreement and records of processing activities (Article 30 GDPR) must be provided. Use cases include training regression and classification models for CTR estimation, validating model predictions against echocardiographic ground truth for left ventricular ejection fraction correlation, and generating normative CTR reference curves stratified by age and sex. This data will support a peer-reviewed publication and an open-source model release under a permissive research license. Institutional authorship acknowledgment will be offered to all contributing hospitals.
Progress
Data Specifications
| Category | Medical imaging |
|---|---|
| Required quantity | 30000 |
| Data types | Medical imaging, X-ray, Chest, DICOM, JSON |
| Budget | USD 60000.00 |
| Deadline | 2026-10-30 |
Use Cases
- Training and validating Medical imaging AI/ML models
- Benchmarking Medical imaging detection and segmentation algorithms
- Building de-identified Medical imaging research datasets for academic studies
- Augmenting existing Medical imaging datasets to reduce class imbalance