Chest X-ray Datasets — CXR & Chest Radiograph Data
Chest X-ray (CXR) datasets are the most widely used medical imaging resource in clinical AI, capturing two-dimensional radiographs of the thorax to evaluate the lungs, heart, mediastinum, pleura, and bony structures. They are central to training models for triage, disease detection, and report generation because chest radiography is inexpensive, fast, and ubiquitous. A typical chest radiograph dataset includes frontal posteroanterior (PA) and anteroposterior (AP) projections plus lateral views, stored as DICOM with windowing, pixel-spacing, view-position, and acquisition-device metadata, and sometimes exported as PNG or JPEG for model training.
Clinically meaningful CXR datasets are labeled for a broad spectrum of findings: pneumonia and other consolidations, pulmonary edema, pleural effusion, pneumothorax, atelectasis, cardiomegaly, pulmonary nodules and masses, fibrosis, emphysema, rib and clavicle fractures, support-device placement such as endotracheal tubes, central lines, and pacemakers, and normal studies. Labels may be derived from structured radiology reports using natural-language processing, or assigned by board-certified radiologists, and the strongest datasets add bounding boxes or pixel-level segmentation masks localizing each finding rather than image-level tags alone. High-value cohorts balance disease prevalence, document label provenance and uncertainty, and pair images with linked radiology reports for multimodal learning.
They are rigorously de-identified, removing PHI from DICOM headers and any burned-in annotations, while preserving diagnostic image quality. Demographic and scanner diversity is essential, because models trained on a single institution's equipment frequently fail to generalize across hospitals. On GetDATA, researchers and medical-imaging companies post chest X-ray requests that specify projections, label taxonomy, annotation type (image-level, bounding box, or segmentation), class balance, report linkage, and minimum image counts, and verified hospitals fulfill those requests with compliant, quality-scored CXR data.
Public benchmarks have repeatedly shown that label noise, position bias, and shortcut learning from view markers or chest drains can inflate apparent performance, so careful curation and external validation across institutions are indispensable. Browse the open chest radiograph requests below, or explore related cross-sectional and cardiac imaging categories.