Chest X-ray Datasets — CXR & Chest Radiograph Data

Chest X-ray (CXR) datasets are the most widely used medical imaging resource in clinical AI, capturing two-dimensional radiographs of the thorax to evaluate the lungs, heart, mediastinum, pleura, and bony structures. They are central to training models for triage, disease detection, and report generation because chest radiography is inexpensive, fast, and ubiquitous. A typical chest radiograph dataset includes frontal posteroanterior (PA) and anteroposterior (AP) projections plus lateral views, stored as DICOM with windowing, pixel-spacing, view-position, and acquisition-device metadata, and sometimes exported as PNG or JPEG for model training.

Clinically meaningful CXR datasets are labeled for a broad spectrum of findings: pneumonia and other consolidations, pulmonary edema, pleural effusion, pneumothorax, atelectasis, cardiomegaly, pulmonary nodules and masses, fibrosis, emphysema, rib and clavicle fractures, support-device placement such as endotracheal tubes, central lines, and pacemakers, and normal studies. Labels may be derived from structured radiology reports using natural-language processing, or assigned by board-certified radiologists, and the strongest datasets add bounding boxes or pixel-level segmentation masks localizing each finding rather than image-level tags alone. High-value cohorts balance disease prevalence, document label provenance and uncertainty, and pair images with linked radiology reports for multimodal learning.

They are rigorously de-identified, removing PHI from DICOM headers and any burned-in annotations, while preserving diagnostic image quality. Demographic and scanner diversity is essential, because models trained on a single institution's equipment frequently fail to generalize across hospitals. On GetDATA, researchers and medical-imaging companies post chest X-ray requests that specify projections, label taxonomy, annotation type (image-level, bounding box, or segmentation), class balance, report linkage, and minimum image counts, and verified hospitals fulfill those requests with compliant, quality-scored CXR data.

Public benchmarks have repeatedly shown that label noise, position bias, and shortcut learning from view markers or chest drains can inflate apparent performance, so careful curation and external validation across institutions are indispensable. Browse the open chest radiograph requests below, or explore related cross-sectional and cardiac imaging categories.

Chest X-ray Datasets — CXR & Chest Radiograph Data

Open Chest X-ray requests

Multi-Label Thoracic Pathology Dataset with Paired Radiology Reports — 40,000 DICOM Chest X-Rays

High-Volume Tuberculosis Screening Chest X-Ray Dataset — 100,000 Images for Programmatic TB AI

Cardiomegaly and Cardiothoracic Ratio Measurement Dataset — 30,000 Annotated PA Chest Radiographs

Pulmonary Nodule and Lung Mass Dataset with Radiologist Bounding Box Annotations — 15,000 Images

Pneumothorax Detection Dataset with Pixel-Level Segmentation Masks — 20,000 PA/AP Radiographs

50,000 Chest X-Rays with Pneumonia Classification Labels and Pathology Metadata

Related categories