Browse Open Medical Data Requests

34 open requests from researchers and companies. No account needed to browse — sign in to fulfill.

2,000 contrast-enhanced chest CT volumes with lung-nodule 3D segmentation masks (LIDC-IDRI style)

Open

We are developing a deep-learning pipeline for automated pulmonary nodule detection, characterization, and malignancy risk stratification, and we require a large, well-annotated chest CT dataset to train and validate our models. Specifically, we need volumetric, thin-slice chest CT series acquired with standard clinical protocols (slice thickness 0.625–1.25 mm, reconstructed in both lung window [WL –600 HU, WW 1500 HU] and soft-tissue window [WL 40 HU, WW 400 HU]). Both contrast-enhanced and non-contrast acquisitions are acceptable, though we prefer a mix to improve model generalizability. Each series must be accompanied by radiologist-confirmed 3D segmentation masks delineating all nodules ≥4 mm in longest diameter, including subsolid and ground-glass opacity (GGO) nodules. Annotation should follow LIDC-IDRI conventions: at least two independent radiologist reads per scan with consensus or majority-vote mask, and per-nodule attributes (subtlety, calcification, spiculation, malignancy suspicion on a 1–5 Likert scale). DICOM series must be fully de-identified per HIPAA Safe Harbor and DICOM PS3.15 guidelines, with all burned-in patient text removed from pixel data. NIfTI-converted volumes and JSON sidecar files containing nodule attributes are strongly preferred for ease of ingestion into our training infrastructure. We will also accept raw DICOM with accompanying NIfTI masks. Scans should span a diverse patient population (age, sex, smoking history where feasible) and include cases with benign nodules confirmed by at least 2-year follow-up stability as well as biopsy-confirmed malignant nodules to create a clinically representative label distribution. Primary use cases include training a 3D U-Net nodule segmentor, a false-positive reduction classifier, and a volumetric growth-rate tracker intended for integration into a lung-cancer screening workflow compliant with Lung-RADS 2022. Secondary use cases include RECIST 1.1 longest-diameter measurement automation and multiplanar reconstruction (MPR) visualization research. Tube voltage should be documented in DICOM metadata, with standard protocols at 120 kVp and low-dose acquisitions between 80–100 kVp both acceptable; effective dose should be below 3 mSv for screening-protocol cases. Scanner manufacturer balance across GE, Siemens, Philips, and Canon platforms is requested to reduce scanner-specific bias. Inter-rater agreement metrics (Dice coefficient, Cohen's kappa for malignancy rating) must be reported per contributing site. Scans with motion artifact, severe streak artifact from metallic implants, or incomplete coverage of both lung apices through the posterior costophrenic angles should be excluded by the contributing site's radiologist before submission. Data will be processed on air-gapped GPU clusters and will not be redistributed. An IRB waiver or equivalent ethics approval documentation must accompany each contributing site.

Medical imagingCTDICOMJSONNIfTI
0 / 2000 scans0%

Multi-Label Thoracic Pathology Dataset with Paired Radiology Reports — 40,000 DICOM Chest X-Rays

Open

We are building the next generation of radiology report generation and multi-label thoracic pathology classification models, requiring a comprehensive chest radiograph dataset with both structured image-level labels and paired de-identified free-text radiology reports. The dataset design is inspired by publicly available benchmarks such as CheXpert and MIMIC-CXR but sourced from European and non-US institutions to ensure geographic and demographic generalizability. Technical specifications: posteroanterior (PA) radiographs are required as the primary view; lateral views from the same study encounter are strongly requested as a supplementary acquisition. Images must be provided in DICOM format at native acquisition resolution (minimum 2048x2048, 12-bit depth), with DICOM headers de-identified per the DICOM PS 3.15 Annex E Basic Application Level Confidentiality Profile. Each study must include a paired de-identified radiology report containing at minimum the Findings and Impression sections. Report de-identification must remove all direct and quasi-identifiers — patient name, dates of service, referring physician name, hospital name, radiologist name — while preserving all clinical content including anatomical descriptions, measurements, and diagnostic conclusions. Free-text reports must be provided in UTF-8 encoded plain text or structured JSON with clearly demarcated Findings and Impression fields. Labeling requirements: each image must carry structured labels for the following 14 thoracic conditions using CheXpert-style positive/negative/uncertain encoding: Atelectasis, Cardiomegaly, Consolidation, Edema, Enlarged Cardiomediastinum, Fracture, Lung Lesion, Lung Opacity, No Finding, Pleural Effusion, Pleural Other, Pneumonia, Pneumothorax, Support Devices. Labels may be extracted via NLP pipeline from the de-identified reports, following the CheXpert or NegBio labeling methodology, but must be reviewed for quality by a radiologist sample audit covering at least 10% of the labeled dataset. Bounding box annotations for at least Pleural Effusion and Cardiomegaly are desirable as an optional supplementary annotation layer, provided in COCO-format JSON. Acquisition standards and QA criteria: radiographs must be of diagnostic quality — adequate inspiration, no severe rotation, no collimator cutoff, and no significant motion artifact. Images should span a minimum of five calendar years to capture protocol and equipment evolution across contributing sites. At least three distinct scanner vendors (e.g., Philips, Siemens, Agfa, Canon) should be represented to reduce vendor-specific bias in trained models. Patient demographics should be recorded in aggregate: age distribution (decade bins), sex distribution, and primary clinical indication for the radiograph (routine check-up, cardiac follow-up, respiratory symptoms, pre-operative clearance). Pediatric images (under 18 years) should be flagged and may constitute up to 10% of the dataset. De-identification and compliance: DICOM header PHI must be removed using the Basic Application Level Confidentiality Profile (DICOM PS 3.15 Annex E) or an equivalent validated pipeline. For European sites, GDPR Article 9 applies to health data — a data processing agreement, records of processing activities under Article 30, and ethics committee approval or waiver documentation must be provided. Burned-in pixel annotations (patient name overlays, acquisition date stamps, institution watermarks embedded during image acquisition) must be confirmed absent or redacted using validated pixel-scrubbing software, with a sample audit confirming complete PHI removal. Use cases include multi-label classification model training, automated radiology report generation research, retrieval-augmented diagnostic reasoning systems, and cross-institutional domain adaptation studies. This is an academic research program with planned open publication. Contributing institutions will be acknowledged as dataset partners in resulting publications, and a data consortium agreement will govern shared governance of the aggregate dataset.

Medical imagingX-rayDICOMJSON
0 / 40000 scans0%

High-Volume Tuberculosis Screening Chest X-Ray Dataset — 100,000 Images for Programmatic TB AI

Open

We are a global health technology organization developing and validating AI-assisted tuberculosis (TB) screening tools for deployment in high-burden, low-resource settings across Sub-Saharan Africa and South-East Asia. We require the largest possible dataset of de-identified chest radiographs with TB-related labels to train models that must generalize across diverse patient populations, scanner types, and acquisition conditions. Technical specifications: both posteroanterior (PA) and anteroposterior (AP) projections are accepted, as programmatic TB screening commonly uses portable or mobile X-ray units producing AP images. Images may be provided in DICOM, PNG, or TIFF format. For PNG and TIFF formats, minimum resolution is 1024x1024 at 8-bit depth; DICOM files should retain native acquisition resolution, which may range from 1500x1500 to 3000x3000 pixels. Images acquired on CR (computed radiography), DR (digital radiography), and analog-digitized film are all acceptable and should be tagged by acquisition modality in accompanying metadata. JSON metadata must record: TB outcome label (positive/negative/indeterminate), label source (sputum culture confirmation, GeneXpert MTB/RIF molecular assay, smear microscopy, radiologist read, or programmatic classification), treatment status if known, and any prior TB history. Labeling requirements: image-level labels are the primary annotation type required. CheXpert-style uncertainty labels (positive/negative/uncertain) are acceptable. Radiological findings associated with active pulmonary tuberculosis — upper lobe consolidation or infiltrate, cavitation, hilar or mediastinal lymphadenopathy, miliary nodular pattern, pleural effusion, and post-primary fibronodular scarring — should be recorded as secondary structured labels in the JSON sidecar file where available from radiologist reads. Images from HIV-positive patients are particularly valuable and should be flagged with HIV co-infection status in an anonymized binary field (HIV-positive: yes/no) without disclosing any additional identifying information. Acquisition diversity and QA criteria: because this dataset targets deployment in low-resource settings, images from a wide range of scanner quality levels are acceptable, including older CR plate systems, mobile digital units, and even digitized film. However, images must be of sufficient diagnostic quality for a trained radiologist to render a clinical read. Severely underexposed or overexposed radiographs, images with significant patient motion artifact, or films with physical damage artifacts should be excluded. Metadata must record scanner type, manufacturer, and approximate year of manufacture where available, as model performance subgroup analysis by acquisition platform is a planned research output. Geographic site metadata (country and WHO TB burden tier) should be included to enable site-level stratification. De-identification and data governance: all DICOM PHI must be removed per HIPAA Safe Harbor or equivalent national standard. For African and Asian contributing sites operating under different national data protection frameworks, data sharing agreements must confirm compliance with applicable local regulations (e.g., Kenya Data Protection Act, India PDPB, Indonesia GR No. 71). Burned-in pixel-data annotations including patient name, hospital identifier, or date of examination embedded at acquisition must be confirmed absent or redacted before delivery. WHO data governance guidelines for health AI datasets apply. This dataset will be used to develop WHO-compliant AI screening tools targeting sensitivity of 90% or higher and specificity of 70% or higher for TB triage in programmatic settings. All models trained on this data will be evaluated against a geographically diverse held-out test set. Contributing institutions will receive a license to the trained model for non-commercial programmatic use. Full compliance with applicable national data protection regulations and WHO data governance guidelines is required.

Medical imagingX-rayDICOMJSONPNG / JPG
0 / 100000 scans0%

Cardiomegaly and Cardiothoracic Ratio Measurement Dataset — 30,000 Annotated PA Chest Radiographs

Open

Our cardiovascular research consortium is developing an automated cardiac silhouette measurement tool to compute the cardiothoracic ratio (CTR) from posteroanterior (PA) chest radiographs, enabling scalable screening for cardiomegaly in primary care and resource-limited settings. We require a large cohort of PA radiographs with expert cardiac and thoracic dimension annotations. Technical specifications: only posteroanterior (PA) projections are acceptable for this request, as CTR measurement is standardized for PA acquisition geometry. AP views introduce systematic magnification error due to the increased source-to-image distance and heart-to-detector gap, and are excluded from this dataset. Images must be provided in DICOM format at full acquisition resolution (minimum 2048x2048, 12-bit), with preserved DICOM header data after de-identification. JSON annotations per image must include: maximum transverse cardiac diameter in pixels and millimeters, maximum transverse thoracic diameter measured at the right hemidiaphragm level, computed CTR value, and image-level cardiomegaly label (CTR greater than 0.5 equals cardiomegaly, per established radiological convention). Cases with pericardial effusion, marked scoliosis, or significant rotation artifacts should be flagged as these conditions affect CTR reliability and require exclusion from the primary training set. Additional clinical label requirements: datasets should record co-existing findings relevant to cardiac pathology, including pulmonary vascular congestion, pleural effusion, and interstitial edema. Patient demographic metadata (age decade, biological sex) and known clinical diagnosis (heart failure, dilated cardiomyopathy, hypertensive heart disease, valvular disease) should be included where available, encoded in a manner that does not constitute a re-identification risk. A balanced case mix is expected: approximately 30% cardiomegaly-positive defined by CTR greater than 0.5, and 70% normal or borderline. Acquisition and QA standards: images must demonstrate adequate inspiration — at least 8–9 posterior ribs visible above the right hemidiaphragm — as inadequate inspiration artificially increases the apparent cardiac silhouette and inflates CTR. Images must be acquired at 100–125 kVp with standard posteroanterior positioning at 1.8–2.0 meters SID. Images showing rotational artifact, where the spinous processes are not equidistant from the medial clavicular heads, should be flagged or excluded. Representation from at least four distinct DR or CR scanner platforms (e.g., Philips DigitalDiagnost, Siemens Ysio, GE Definium, Canon CXDI series) is requested to ensure vendor generalizability of the trained model. Radiographs acquired over at least a five-year span are desirable to capture acquisition protocol evolution. De-identification and compliance: all 18 HIPAA-defined PHI categories must be removed from DICOM headers per the Safe Harbor method. Burned-in pixel data annotations such as patient name banners, institutional logos, or laterality markers embedded during acquisition must be confirmed absent or removed prior to delivery. For European contributing institutions, a data processing agreement and records of processing activities (Article 30 GDPR) must be provided. Use cases include training regression and classification models for CTR estimation, validating model predictions against echocardiographic ground truth for left ventricular ejection fraction correlation, and generating normative CTR reference curves stratified by age and sex. This data will support a peer-reviewed publication and an open-source model release under a permissive research license. Institutional authorship acknowledgment will be offered to all contributing hospitals.

Medical imagingX-rayDICOMJSON
0 / 30000 scans0%

Pulmonary Nodule and Lung Mass Dataset with Radiologist Bounding Box Annotations — 15,000 Images

Open

We are a digital health startup developing a pulmonary nodule detection and malignancy risk-stratification model for integration into PACS worklist prioritization. We require chest radiographs with radiologist-annotated bounding boxes around pulmonary nodules and masses, covering a range of sizes, densities, and anatomical locations. Technical specifications: posteroanterior (PA) chest radiographs are the primary acquisition type required; lateral views for the same patient encounter are requested as supplementary data where available. Images must be supplied in DICOM format at full acquisition resolution (minimum 2048x2048, 14-bit depth). Bounding box annotations must be provided in JSON format, with each annotation record containing: bounding box coordinates (x, y, width, height in pixels), nodule diameter estimate in millimeters, location descriptor (upper/middle/lower zone, left/right), density category (solid, ground-glass, part-solid), and Fleischner Society size category. Cases must include confirmed nodules measuring 6mm or larger. Images with no detectable nodules should constitute 40–50% of the dataset to ensure realistic negative sampling. Clinical label requirements: each nodule case should carry an image-level label indicating benign, malignant, or indeterminate based on available follow-up imaging, biopsy, or multidisciplinary tumor board (MDT) consensus. If histopathological confirmation is available, this should be recorded in the JSON metadata. Annotations must originate from radiologists with subspecialty thoracic experience. AI pre-annotation is acceptable provided each bounding box was reviewed and approved by a radiologist. Acquisition and QA criteria: radiographs must meet a minimum image quality standard — adequate lung expansion (at least 8–10 posterior ribs visible), no severe rotation (the spinous processes should be equidistant from the medial clavicular ends), and no motion artifact degrading nodule visualization. Images acquired at 100–125 kVp with appropriate mAs are preferred. Scanner vendor metadata should be retained and diverse representation from multiple manufacturers including GE Healthcare, Siemens Healthineers, Fujifilm, and Philips is requested. Pediatric cases (under 18 years) should be excluded unless specifically from a TB-endemic population study; the target demographic is adults aged 40–80 with a smoking history or incidental nodule finding. Cases with prior lobectomy, pneumonectomy, or extensive pleural plaques overlying the nodule region should be excluded or flagged. De-identification and regulatory compliance: all DICOM headers must be de-identified per HIPAA Safe Harbor or equivalent standard, removing patient name, date of birth, institution name, and all date fields. Burned-in text overlays in the image pixel data — such as institution watermarks, patient demographic banners, or laterality labels embedded at acquisition — must be removed without cropping any portion of the lung parenchyma. Compliance documentation including IRB approval number and de-identification method attestation must accompany each contributing institution's data submission. Primary use cases: this dataset will train a nodule detection model targeting sensitivity of 90% or higher at clinically relevant specificity, for integration as a second-reader CAD tool. The model is intended for deployment in community radiology settings without subspecialty thoracic coverage. Data will not be shared beyond the contracted research team without explicit re-consent from the contributing institution.

Medical imagingX-rayDICOMJSON
0 / 15000 scans0%

Pneumothorax Detection Dataset with Pixel-Level Segmentation Masks — 20,000 PA/AP Radiographs

Open

Our medical imaging company is building a real-time pneumothorax detection and severity-grading tool for use in emergency radiology workflows. We require a curated dataset of chest radiographs containing confirmed pneumothorax cases alongside age- and sex-matched normal controls, with pixel-level segmentation annotations delineating the pleural air collection. Technical specifications: both posteroanterior (PA) and anteroposterior (AP) projections are required, as pneumothorax is frequently captured on portable AP studies in intensive care and emergency settings. Images must be delivered in DICOM format at native acquisition resolution (minimum 1800x1800, 12-bit), with header metadata retained after de-identification. A minimum of 40% of images must be pneumothorax-positive. For each positive case, a binary segmentation mask (PNG, aligned pixel-for-pixel with the source image) must be provided indicating the pneumothorax region. JSON sidecar files must record laterality (left/right), estimated pneumothorax size category (small less than 15%, moderate 15–30%, large greater than 30% lung field), and whether a chest tube or other thoracic support device is visible. Labeling quality requirements: segmentation masks must be drawn by or reviewed by board-certified radiologists or trained radiologic technologists under radiologist supervision. Inter-annotator agreement metrics (Dice coefficient) should be reported if available. Cases with tension pneumothorax or bilateral pneumothorax should be flagged separately, as these represent clinically distinct high-priority findings. Lateral view images, if available for the same encounter, are highly desirable as supplementary data for model generalization. Acquisition and QA criteria: portable AP radiographs acquired at the bedside in ICU or emergency settings are especially valuable because they represent the real-world distribution where pneumothorax detection tools will be deployed. Images from digital radiography (DR) and computed radiography (CR) systems are both acceptable. Acquisition parameters including kVp, mAs, and SID should be retained in DICOM headers post de-identification. Exclusion criteria include severely rotated images (greater than 10-degree tilt), images with significant subcutaneous emphysema obscuring the lung outline, and any radiograph where pixel-level de-identification of burned-in text has altered the lung field region. Post-thoracotomy and post-lobectomy cases may be included but must be flagged, as pleural architecture differs from typical presentations. De-identification and compliance: all DICOM PHI must be removed per the HIPAA Safe Harbor standard or DICOM Basic Application Level Confidentiality Profile. For European contributing sites, GDPR Article 9 special-category health data provisions apply; a data processing agreement (DPA) and institutional review board approval or ethics committee waiver must be confirmed prior to transfer. Any burned-in pixel annotations (patient ID, laterality overlays) must be removed before delivery. This dataset will support both model training and regulatory submission as part of a 510(k) or CE-MDR pathway. Accordingly, provenance documentation — including acquisition site, scanner manufacturer and model, and annotation date — is required per image. A data use agreement will be executed with each contributing institution before data transfer.

Medical imagingX-rayDICOMJSONPNG / JPG
0 / 20000 scans0%

50,000 Chest X-Rays with Pneumonia Classification Labels and Pathology Metadata

Open

We are a pulmonary AI research group developing a deep-learning classifier for community-acquired pneumonia detection using posteroanterior (PA) chest radiographs. We require a large, demographically diverse dataset of de-identified chest X-rays with confirmed image-level pneumonia labels, sourced from emergency department and outpatient radiology workflows. Technical specifications: PA-view radiographs are strongly preferred, with AP-view images acceptable if clearly marked. Images must be provided in DICOM format, preserving full-resolution acquisition data (minimum 2048x2048 pixels, 12-bit depth). Each image must include DICOM header metadata (acquisition parameters, patient age range, biological sex) after de-identification per HIPAA Safe Harbor or equivalent regulatory standard. Accompanying JSON metadata per file must record the image-level label (pneumonia-positive / pneumonia-negative / indeterminate), label source (radiologist consensus, single radiologist, or structured report extraction), and any co-existing findings such as pleural effusion or pulmonary consolidation. Labeling requirements follow CheXpert-style conventions: each image-level label must indicate positive, negative, or uncertain for pneumonia. Where available, the de-identified radiology report impression section should be included as free text. Datasets with at least 30% positive cases are preferred to avoid extreme class imbalance. Labels must originate from board-certified radiologists; AI-generated labels are acceptable only as a secondary annotation layer, clearly flagged. Acquisition and QA criteria: images must pass a minimum quality threshold — no severe motion blur, no collimator cut-off, and adequate exposure index (EI). Radiographs acquired on digital radiography (DR) systems are preferred, although computed radiography (CR) plate-based images are acceptable. Scanner vendor and model must be recorded in metadata to enable downstream subgroup analysis by acquisition system. Images from at least three distinct scanner manufacturers (e.g., Philips, Siemens, GE Healthcare) are requested to ensure vendor diversity. Pediatric studies (patients under 18) must be flagged and may be included as a stratified subset, but adult studies age 18–85 constitute the core population. Exclusion criteria include post-pneumonectomy images, images with burned-in patient annotations that cannot be removed without cropping clinically relevant lung regions, and images where de-identification of DICOM PHI cannot be confirmed. De-identification compliance: all DICOM files must be de-identified according to HIPAA Safe Harbor (45 CFR § 164.514(b)) or the DICOM PS 3.15 Annex E Basic Profile, removing all 18 PHI categories including patient name, birth date, admission date, and device identifiers. Burned-in text annotations in pixel data (e.g., patient initials, laterality markers embedded via image acquisition) must be verified absent or redacted. GDPR-equivalent standards apply to European-sourced data. This dataset will be used to train and externally validate pneumonia triage models intended for low-resource clinical settings. Results will be published in peer-reviewed journals. Institutional data sharing agreement and IRB confirmation of waiver or approval will be provided prior to transfer. Hospitals must confirm the absence of re-identification risk under the provided de-identification protocol.

Medical imagingX-rayDICOMJSON
0 / 50000 scans0%