Browse Open Medical Data Requests

34 open requests from researchers and companies. No account needed to browse — sign in to fulfill.

5,000 transthoracic echocardiogram studies with LVEF measurements for heart-failure screening AI

Open

We are developing a deep-learning model to automate left ventricular ejection fraction (LVEF) estimation from transthoracic echocardiography (TTE) studies acquired in routine clinical care. The primary intended use case is population-level heart-failure screening integrated into existing cardiology workflows, reducing the reporting burden on sonographers and cardiologists while enabling earlier intervention in at-risk patients. We require full cine-loop acquisitions from the apical 4-chamber (A4C) and apical 2-chamber (A2C) views, recorded at a minimum of 25 frames per second with spatial resolution no lower than 224×224 pixels. Each study should include at least three complete cardiac cycles per view. Harmonic B-mode imaging is preferred over fundamental mode to improve endocardial border delineation. Data must be delivered in DICOM format with all protected health information removed or replaced per HIPAA Safe Harbor or equivalent GDPR pseudonymisation protocols, including stripping of DICOM tags 0010,0010 through 0010,0040 and any burned-in annotation text. A de-identification manifest confirming the specific method applied is required alongside each batch delivery. Clinical labels must include a cardiologist-verified LVEF value measured by the biplane Simpson's method of discs, New York Heart Association (NYHA) functional class where available, and a binary heart-failure diagnosis flag. Segmentation masks delineating the left ventricular endocardial border at end-diastole and end-systole in the A4C view are strongly preferred for the full dataset and mandatory for at least 30% of studies. Additional chamber measurements — left ventricular end-diastolic volume (LVEDV), end-systolic volume (LVESV), left atrial volume index, and diastolic function grade per ASE 2016 guidelines — are welcomed as supplementary labels to broaden the model's downstream applicability. Patient demographic metadata including age decade, biological sex, body mass index category, and primary diagnosis (heart failure with reduced ejection fraction HFrEF, heart failure with preserved ejection fraction HFpEF, or no heart failure) should be retained in anonymised DICOM tags or an accompanying JSON sidecar. Studies from multiple scanner vendors — GE Vivid, Philips EPIQ, Siemens Acuson, and Canon Aplio — are explicitly sought to ensure device-agnostic model generalisation. Quality control: studies with greater than 20% of frames degraded by ultrasound dropout, rib shadow, or patient motion artefact should be excluded or flagged with a quality score below threshold. A sonographer-assigned image quality rating (1 poor, 2 adequate, 3 good) is requested for each cine loop.

Medical imagingUltrasoundDICOMJSON
0 / 5000 scans0%

50,000 Single-Lead Wearable ECG Strips for Large-Scale Atrial Fibrillation Population Screening

Open

Consumer and clinical-grade wearable devices — smartwatches, chest patches, and handheld recorders — are increasingly used for opportunistic AF screening in primary care and community settings. However, models trained on clinical 12-lead ECGs perform poorly on single-lead data because of electrode placement variability, motion artefact, and the absence of spatial voltage information. We are developing a dedicated single-lead AF detection model targeting deployment in FDA Class II-cleared wearable devices. We require 50,000 single-lead ECG recordings, each equivalent to Lead I or a modified limb-lead configuration, with recording durations of 30 seconds to 5 minutes per strip. Minimum sampling rate is 200 Hz; 256 Hz or 300 Hz (typical of consumer optical-to-electrical biosignal chips) is preferred. Amplitude resolution of ≥8-bit is the floor; 12-bit is preferred. Preferred formats are CSV (column-per-channel with ISO 8601 timestamp) or JSON with signal array, sample-rate field, and metadata object. Data may originate from any cleared handheld or wrist-worn single-lead recorder (AliveCor KardiaMobile, Withings ScanWatch, Zio patch, or equivalent clinical Holter export truncated to single channel). Each strip must carry a rhythm label: AF confirmed, AF not present, technically inadequate or excessive artefact. Labels must be generated by a certified cardiac physiologist or electrophysiologist, not by the device own algorithm, to avoid label noise from the very systems our model aims to replace. The labeling protocol requires human expert review using a validated browser-based or desktop annotation platform displaying the raw waveform; annotators must be blinded to the device automatic interpretation. A minimum of 5% of all strips must undergo dual independent annotation for inter-rater reliability assessment; Cohen's kappa for the AF-confirmed versus AF-not-present binary decision must be ≥0.80. Strips flagged as technically inadequate must also be reviewed by a second annotator before final labeling, as false inadequacy labeling artificially inflates the rejection rate and degrades training signal. Strips with significant baseline wander, muscle artefact, or lead-off events are valuable as hard negatives and should be labelled as technically inadequate rather than discarded. Because wearable recordings are inherently susceptible to high-frequency motion noise during physical activity, recordings captured during walking, stair climbing, or light exercise (documented by device accelerometer data if available) are specifically solicited to build robustness at inference time. QRS morphology characteristics such as irregular RR intervals, absent P-waves, fibrillatory baseline, and variable QRS amplitude — the hallmarks of AF in single-lead traces — should be used as secondary annotation cues and documented in per-strip quality notes. De-identification must comply with HIPAA Safe Harbour or GDPR Article 89 pseudonymisation. All device-embedded PHI (patient name, date of birth, device serial number traceable to a named individual) must be removed or replaced with surrogate identifiers before delivery. Recordings must not include GPS coordinates or location data, even in embedded metadata fields. Subject-level metadata should include age, sex, BMI, and known AF history (paroxysmal, persistent, permanent, or no known AF), as these features will be used as auxiliary inputs to the model. Atrial fibrillation subtype (paroxysmal versus persistent versus permanent) must be documented where known, as paroxysmal AF episodes captured mid-episode represent the highest clinical value and are the most challenging to detect. All data must be de-identified per HIPAA or GDPR standards. We anticipate an AF prevalence of 15–25% in the supplied dataset, reflecting a screening-enriched population rather than a general community sample. Downstream use cases include a consumer AF detection app embedded in a cleared smartwatch, a primary-care nurse-administered screening kiosk, population-level epidemiological AF prevalence tracking via wearable aggregation, and federated model training across wearable device manufacturer partnerships without centralising raw patient data.

Sensor / device dataECGCSVJSON
0 / 50000 scans0%

6,000 Pediatric 12-Lead ECGs Across Age Groups from Neonates to Adolescents with Diagnostic Labels

Open

Pediatric cardiology is a critically underserved domain in AI-driven ECG interpretation because pediatric ECG morphology differs substantially from adult norms: higher resting heart rates, right-ventricular dominance in neonates, evolving QRS axis, and age-specific QTc reference ranges all mean that models trained on adult datasets perform poorly in children. We are building the first large-scale, age-stratified pediatric ECG AI classifier to screen for congenital heart disease, inherited channelopathies, and acquired conditions including Kawasaki disease and myocarditis. We require 6,000 resting 12-lead ECG recordings from patients aged 0–17 years, with the following minimum stratification: neonates and infants (0–12 months) 1,000 recordings, toddlers and pre-school (1–5 years) 1,000 recordings, school-age (6–12 years) 2,000 recordings, and adolescents (13–17 years) 2,000 recordings. Sampling rate must be ≥500 Hz; paper-speed equivalent of 25 mm/s and gain of 10 mm/mV must be documented. EDF or WFDB formats are required. Recording duration ≥10 seconds; longer strips preferred for rhythm assessment. Each record must include cardiologist diagnostic labels from the following categories: normal for age, right bundle branch block, left ventricular hypertrophy, Wolff-Parkinson-White pattern, long-QT syndrome, supraventricular tachycardia, complete AV block, and congenital heart disease (specifying anatomy where known). Reports or structured cardiology findings summaries should accompany records where available, as these provide essential contextual supervision signal. The labeling protocol must be carried out exclusively by board-certified pediatric cardiologists. Primary interpretation is performed by a pediatric cardiology fellow or attending with electrophysiology training; all abnormal findings must be over-read and confirmed by a senior pediatric cardiologist. Diagnostic criteria must be referenced to published age-normative tables (Davignon, Rijnbeek, or equivalent peer-reviewed pediatric ECG reference ranges) because QRS duration, QTc limits, and R-wave amplitude thresholds differ substantially between age groups. Inter-annotator agreement must be assessed for a minimum 10% random subsample, with Cohen's kappa reported per diagnostic category and documented in the data release. Acquisition parameters must be fully documented per record: device manufacturer and model, paper speed setting, gain setting (typically 10 mm/mV for standard leads, 5 mm/mV for high-amplitude neonatal tracings), electrode placement protocol (standard limb positions or pediatric chest electrode spacing), and patient cooperation level (resting/awake, sleeping, or crying — since motion artefact in infants is a major confound). QTc values must be calculated using the Bazett correction for comparison with age-normative ranges, with the raw QT and preceding RR interval also provided. De-identification is strictly required under HIPAA or equivalent national regulation. Because pediatric patients are a protected class, particular care must be taken to remove any free-text that could identify the child or parent. Only age in months for those under two years, or age in completed years, sex assigned at birth, body weight percentile, and relevant metabolic or genetic screening results (e.g., channelopathy gene panel result if available) should be retained as metadata. Strict de-identification is required; only age in months (for those under two years) or age in completed years, sex assigned at birth, and body weight percentile should be retained as metadata. We strongly encourage participation from tertiary paediatric cardiac centres, as these institutions concentrate the rare diagnoses most valuable to the classifier. Downstream use cases include deployment as a screening decision-support tool in general pediatric clinics and neonatal intensive care units, integration with wearable infant cardiac monitors, and a federated learning study across multiple pediatric hospitals to address data rarity.

Sensor / device dataECGEDFWFDB
0 / 6000 scans0%

5,000 Serial 12-Lead ECGs for QT-Interval Prolongation and Drug-Induced Arrhythmia Safety Monitoring

Open

A pharmaceutical research organisation is compiling a reference dataset to train and benchmark automated QTc-interval measurement algorithms intended for use in ICH E14-compliant thorough QT studies and ongoing cardiac safety surveillance during drug development. We require serial 12-lead ECG recordings from adult patients or healthy volunteers collected under controlled, medically supervised conditions. Each subject should contribute at least three recordings at defined time points (pre-dose baseline, peak plasma concentration, and ≥4-hour post-dose or equivalent); paired time-point recordings are essential for QT correction modelling. Sampling rate must be ≥1000 Hz to support accurate automated beat detection and interval measurement; 500 Hz is the minimum acceptable threshold. Amplitude resolution must be ≥1 μV. Files must be provided in EDF or CSV format with explicit timestamp alignment between recordings from the same subject. Lead II and the precordial leads (V1–V6) are the primary measurement channels. For each recording we require: automated and over-read cardiologist QTc measurements (Bazett and Fridericia correction), individual beat-level RR intervals and QT intervals (minimum 10 beats averaged), morphology flags for T-wave alternans, U-wave presence, and bifid T-wave, and an overall interpretive statement. Keypoint annotations for P-wave onset, QRS onset, and T-wave offset (tangent method) on Lead II are mandatory for algorithm benchmarking. The annotation labeling protocol must comply with ISCE/ISHNE and ICH E14 guidance on ECG interval measurement in drug studies. Primary QT and QTc measurements must be performed by a trained ECG reader using a validated digital caliper tool; over-read must be performed by a board-certified cardiologist with clinical pharmacology or cardiology electrophysiology subspecialty. For each recording, at least 10 consecutive sinus beats must be individually measured and averaged; ectopic beats, paced beats, and beats following a pause must be excluded from the average. Inter-reader variability for QTc measurement must be ≤5 ms mean absolute difference across a randomly sampled 10% re-annotation subset; this metric must be reported in the dataset release documentation. De-identification must satisfy HIPAA Safe Harbour or equivalent GDPR pseudonymisation. Subject-level metadata must include age, sex, BMI, serum electrolyte values (potassium, magnesium, calcium) at time of recording, concomitant medication list at the drug class level, and heart rate at each time point. Any clinical-trial identifiers or site codes must be replaced with anonymised surrogate codes. Data originating from Phase I healthy-volunteer studies are particularly valuable as they represent a clean baseline population; data from patients with prolonged QTc at baseline (>450 ms men, >470 ms women) are equally important as high-sensitivity challenge cases. Data from patients currently receiving QT-prolonging agents (antiarrhythmics, antipsychotics, certain antibiotics) are especially valuable, as are recordings from subjects with known long-QT syndrome (congenital or acquired). Downstream use cases include regulatory-grade central ECG laboratory software for Phase I–III clinical trials, a precision-medicine tool stratifying drug candidates by proarrhythmic risk, and a QTc-monitoring dashboard embedded in hospital pharmacy systems to flag high-risk drug combinations in real time.

Sensor / device dataECGCSVEDF
0 / 5000 scans0%

15,000 12-Lead ECGs with STEMI and NSTEMI Labels for Acute MI Detection AI

Open

We are building a real-time, point-of-care AI system to detect ST-elevation myocardial infarction (STEMI) and non-ST-elevation myocardial infarction (NSTEMI) from the initial 12-lead ECG acquired in the emergency department. Early automated flagging of STEMI is a critical bottleneck in door-to-balloon time, and our system targets integration with existing ECG cart software to produce an alert within 10 seconds of acquisition completion. We require 15,000 12-lead ECG recordings at ≥500 Hz sampling rate, ≥12-bit amplitude resolution, with a minimum recording duration of 10 seconds. Files must be provided in WFDB or EDF format; machine-readable XML exports from GE MUSE or Philips TraceMaster systems accompanied by raw voltage data are also acceptable. The case mix must include: confirmed STEMI (by culprit-artery territory — anterior, inferior, lateral, posterior), confirmed NSTEMI, unstable angina with ECG changes, and normal/benign controls. Confirmed diagnoses must be backed by troponin results and, where available, catheterisation or echocardiography findings referenced in an accompanying clinical summary. Annotation requirements per record: ST-segment elevation or depression measurements (in mV, per lead), territory classification, cardiologist final diagnosis, and Killip class if available. Keypoint annotations for QRS onset, J-point, and ST-segment measurement point at 60–80 ms post-J-point are required for the training of measurement regression heads. The annotation labeling protocol requires a minimum of two independent cardiologist readers per record. Initial annotations are produced by an interventional cardiologist or senior cardiology registrar; a second interventional cardiologist performs blind over-read. In cases of STEMI/NSTEMI disagreement, a third senior cardiologist adjudicates. ST-elevation thresholds must follow the 2018 ESC Fourth Universal Definition of Myocardial Infarction criteria: ≥1 mm elevation in two contiguous limb leads, ≥2 mm in two contiguous precordial leads (≥2.5 mm in men under 40, ≥1.5 mm in women), or new left bundle branch block pattern with hemodynamic compromise. Inter-reader kappa for the STEMI binary label must be ≥0.85 across the contributed subset. De-identification is mandatory: all HIPAA-specified PHI must be removed from DICOM-ECG or MUSE XML headers, including patient ID, name, date of birth, admission date shifted to a relative offset, and institution identifiers. Only age, sex, cardiovascular risk factors (smoking status, hypertension, dyslipidaemia, diabetes, prior MI or PCI), symptom-onset-to-ECG time in minutes, and peak troponin value (categorical: negative, mildly elevated, markedly elevated) may be retained. Data must be de-identified with all protected health information removed. Age, sex, and major cardiovascular risk factors are essential metadata fields. We are particularly interested in recordings obtained within the first two hours of symptom onset, as early MI ECGs are substantially underrepresented in public datasets such as PTB-XL and MIMIC-IV-ECG. Downstream use cases include integration into emergency department triage workflows as a clinical decision support tool, training a multi-label classifier capable of simultaneously identifying STEMI territory, Wellens syndrome, and de Winter T-wave patterns, and serving as a benchmark dataset for regulatory submissions to the FDA under 510(k) substantial equivalence evaluation for AI-based ECG analysis software.

Sensor / device dataECGCSVEDFWFDB
0 / 15000 scans0%

8,000 Holter 24-Hour Ambulatory ECG Recordings for Arrhythmia Burden Quantification

Open

Our research group is developing an automated arrhythmia-burden analysis pipeline for long-duration ambulatory ECG data. We require a dataset of continuous 24-hour (or longer) Holter recordings collected from adult patients referred for ambulatory cardiac monitoring, covering a broad spectrum of rhythm disturbances including paroxysmal atrial fibrillation, premature ventricular contractions (PVCs), supraventricular ectopy, second- and third-degree AV block, and ventricular tachycardia runs. Technical requirements: recordings must include at minimum two channels (Lead II and a modified V5 derivation); three-channel recordings are preferred. Sampling rate must be ≥200 Hz with amplitude resolution ≥2.5 μV. The preferred file format is EDF or WFDB multi-segment; raw binary exports accompanied by a full header describing gain, offset, and channel labels are acceptable. Beat-level annotations following the AAMI EC57 annotation scheme (N, S, V, F, Q classes) produced by a certified cardiac physiologist and confirmed by a supervising cardiologist are mandatory. Episode-level labels indicating AF burden (percentage of recording time in AF), total PVC count, longest VT run duration, and overall arrhythmia classification are also required. The annotation workflow must enforce dual-reader review: an initial beat-by-beat annotation generated by a validated automated Holter analysis system (GE MARS, Spacelabs Oxford, or equivalent) must be manually reviewed and corrected by a credentialed cardiac physiologist, with a supervising electrophysiologist adjudicating all rhythm episodes exceeding 30 seconds. Inter-annotator reliability metrics, including percentage agreement for N, V, and S class beats across a 5% random re-annotation subset, must be reported and provided alongside the dataset. Label taxonomy must align with AAMI EC57 and EC38 standards to ensure compatibility with benchmark evaluations. De-identification must comply with applicable HIPAA or GDPR requirements. Patient age expressed as completed years at time of recording, sex, body-mass index, primary indication for monitoring (palpitations, syncope, breathlessness, post-ablation follow-up, or hypertension surveillance), and structural heart disease status must be preserved as structured metadata. Free-text diary entries or patient event logs must be reviewed and redacted before delivery; only event timing and general symptom category (palpitation, dizziness, presyncope, chest pain) should be retained. QA exclusion criteria: any 24-hour recording with more than 2 hours of uninterpretable signal due to electrode detachment or severe motion artefact must be flagged; recordings with total annotatable signal below 18 hours are excluded from the primary count but may be included as a supplementary low-quality subset. All data must be de-identified per applicable regulation; patient age, sex, and primary indication for monitoring should be preserved as structured metadata. We have a strong preference for recordings that include patient-activated event markers aligned to symptoms, as these allow supervised training of symptom-correlated arrhythmia models. Institutions contributing ≥500 recordings with complete beat-level annotation will receive priority payment processing. The target use case is a commercial-grade Holter analysis SaaS product currently in FDA Breakthrough Device evaluation, with a secondary research application targeting AF-burden-guided anticoagulation decision support integrated into cardiology electronic health record systems.

Sensor / device dataECGEDFWFDB
0 / 8000 scans0%

12,000 Resting 12-Lead ECG Recordings with Expert Atrial Fibrillation Annotations

Open

We are seeking a large, well-annotated dataset of resting 12-lead ECG recordings from adult patients to train and validate a deep-learning classifier for atrial fibrillation (AF) detection. The intended model architecture is a convolutional-recurrent network that operates directly on raw voltage traces, and its performance is highly sensitive to dataset size, annotation quality, and demographic diversity. Each recording must capture all twelve standard leads (I, II, III, aVR, aVL, aVF, V1–V6) at a minimum sampling rate of 500 Hz, with amplitude resolution of at least 1 μV (12-bit ADC or better). Recording duration must be ≥10 seconds per strip; longer 30-second captures are strongly preferred. Accepted file formats are WFDB (PhysioNet/MIT-BIH style header + signal files) or EDF; CSV exports with a standardised column schema are acceptable as a secondary option. Each record must be accompanied by a cardiologist-confirmed rhythm label — at minimum a binary AF / non-AF tag — with additional labels for flutter, supraventricular tachycardia, normal sinus rhythm, and sinus bradycardia strongly preferred. Keypoint annotations marking P-wave onset and offset, QRS complex onset, peak, and offset, and T-wave end are highly desirable for training auxiliary tasks. The labeling protocol must follow a two-stage review: a primary annotation produced by a board-certified cardiologist or credentialed cardiac physiologist, followed by independent over-read by a second annotator; disagreements must be adjudicated by a senior electrophysiologist. Inter-rater agreement (Cohen's kappa) should be reported per rhythm class and included in the dataset documentation. All annotations must use a standardised label taxonomy aligned with the AHA/ACC ECG terminology guidelines to ensure compatibility with publicly available benchmarks such as PhysioNet Challenge datasets and the MIMIC-IV-ECG corpus. De-identification must satisfy HIPAA Safe Harbour or an equivalent EU GDPR pseudonymisation standard: no patient name, date of birth, facility name, or accession numbers may appear in signal file headers or companion metadata files. Any free-text physician notes attached to the recording must be scrubbed using a validated PHI-detection NLP pipeline before delivery. Age bucket (decade), biological sex, and comorbidity flags (hypertension, heart failure, diabetes) should be retained as structured metadata fields. Quality exclusion criteria: recordings with any of the following must be flagged or removed — electrode reversal artefact detectable from lead polarity inversion, lead-off noise affecting more than two contiguous leads, baseline wander exceeding 0.5 mV peak-to-peak, or signal clipping. We require a balanced AF-to-non-AF ratio of no worse than 1:3, and we encourage inclusion of paroxysmal AF cases captured during or immediately after an episode, as these are clinically the hardest to classify and most valuable for model generalisation. Demographic balance across age groups (18–40, 41–60, 61–80, >80 years) and sex is mandatory. Downstream use cases include a real-time AF alert integrated into hospital ECG cart software, a cloud-based clinical decision support API, and federated training experiments across multiple institution nodes.

Sensor / device dataECGEDFWFDB
0 / 12000 scans0%

2,500 abdomen-pelvis CT volumes for kidney stone detection, stone composition classification, and urolithiasis burden scoring

Open

Our urology AI research group is assembling a comprehensive urolithiasis dataset to train models that automatically detect, localize, size, and classify renal and ureteral stones on non-contrast CT of the abdomen and pelvis (NCCT-KUB protocol). Kidney stones typically present as hyperdense foci (200–1000+ HU depending on composition; uric acid stones appear at lower density 200–400 HU, while calcium oxalate stones exceed 800 HU), making NCCT the gold-standard imaging modality for urolithiasis evaluation. Accurate Hounsfield unit measurement is essential for predicting stone composition and guiding treatment selection (shock-wave lithotripsy vs. ureteroscopy vs. percutaneous nephrolithotomy). Imaging protocol: NCCT acquisitions at 120 kVp (or low-dose 80–100 kVp protocols acceptable), slice thickness ≤2.5 mm, reconstructed in both soft-tissue window (WL 40 HU, WW 400 HU) and bone window (WL 400 HU, WW 1800 HU) for stone conspicuity. Coronal and sagittal MPR series are strongly encouraged. Each study must cover from the superior poles of the kidneys through the urinary bladder (ureterovesical junction). Dual-energy CT studies with stone composition maps are particularly valuable and should be flagged separately. Volumetric DICOM series are required; thin-section reconstructions at ≤1.25 mm for dual-energy cases enable virtual monochromatic image generation at 40–70 keV for optimal stone conspicuity and composition discrimination. Annotation requirements: bounding-box localization for each stone with anatomical location label (upper/mid/lower pole calyx, renal pelvis, proximal/mid/distal ureter, bladder), maximum axial stone diameter in millimeters per RECIST convention, mean HU value, and a stone composition prediction label (calcium oxalate, calcium phosphate, uric acid, struvite, cystine, or mixed) where dual-energy or prior metabolic workup data are available. Total stone burden (number and cumulative volume in cubic millimeters) should be recorded per patient in the JSON sidecar. Negative studies (no stone) must account for at least 25% of the dataset. Inter-rater agreement for stone localization (bounding-box IoU ≥ 0.60) and mean HU measurement (within ±50 HU) must be reported. QA exclusion criteria include scans with severe streak artifact from bilateral hip prostheses obscuring the ureters, incomplete coverage of the KUB field, or absence of acquisition kVp in DICOM metadata. De-identification per HIPAA Safe Harbor; ureteral stent or nephrostomy tube presence must be flagged in the JSON sidecar as these hardware items directly affect stone visibility and HU measurement accuracy. Acceptable formats are DICOM and NIfTI. The trained model targets integration into automated radiology reporting pipelines to generate structured urolithiasis reports, reducing radiologist workload while improving measurement reproducibility. Scanner diversity across GE, Siemens, Philips, and United Imaging platforms is required; contributions from institutions in multiple geographic regions are welcome to capture dietary and demographic variation in stone epidemiology, as stone composition prevalence differs markedly between Western and East Asian populations.

Medical imagingCTDICOMJSONNIfTI
0 / 2500 scans0%

800 CTPA studies with pulmonary embolism segmentation masks and clot-burden scoring for AI detection

Open

We are developing a deep-learning system for automated detection and clot-burden quantification of acute pulmonary embolism (PE) on CT pulmonary angiography (CTPA). PE manifests as intraluminal filling defects within pulmonary arteries, typically appearing as low-attenuation regions (–20 to +80 HU) surrounded by contrast-enhanced blood (200–400 HU at peak enhancement). Accurate segmentation of emboli from the main, lobar, segmental, and subsegmental pulmonary arteries is the central annotation task. Required imaging protocol: CTPA acquired with bolus-tracked contrast injection (iodinated contrast, 100–120 mL at 4–5 mL/s), slice thickness ≤1.25 mm, reconstructed in mediastinal window (WL 40 HU, WW 400 HU) and lung window (WL –600 HU, WW 1500 HU). Each study must include a complete volumetric DICOM series with full coverage from the lung apices to the costophrenic angles. Incidental findings such as pleural effusion, right heart strain (RV/LV ratio ≥0.9), and pulmonary infarct should be flagged in the JSON metadata but do not require pixel-level annotation. Tube voltage should be 100–120 kVp with automated tube-current modulation; studies acquired at non-standard voltages must include CTDI vol and DLP values in DICOM metadata for dose normalization. Volumetric series must be isotropic or near-isotropic (≤1.25 mm reconstructed slice thickness) to enable accurate three-dimensional vessel-tree segmentation and embolus localization. Annotation requirements: 3D segmentation masks of all emboli, centreline labeling of affected vessel segments, and a computed modified Miller index (mMI) or Qanadli clot-burden score. Negative CTPA studies (no PE) should constitute at least 30% of the dataset to enable specificity optimization. Cases confirmed by ventilation-perfusion (V/Q) scan or catheter angiography are particularly valuable and should be flagged accordingly. Inter-rater agreement for embolus segmentation must reach a Dice coefficient of ≥0.70 on lobar and segmental arteries; subsegmental PE cases may be annotated by consensus read given known inter-observer variability. Scanner balance across GE, Siemens, and Philips CTPA protocols is requested, and at least 15% of studies should originate from institutions in different countries to capture contrast injection protocol variations. QA exclusion criteria include studies with inadequate arterial opacification (main pulmonary artery attenuation below 200 HU), respiratory motion artifact degrading vessel conspicuity, or missing coverage of the pulmonary arterial trunk. De-identification per HIPAA Safe Harbor and DICOM PS3.15, with consistent pseudonymization for patients with follow-up imaging. Delivery in NIfTI-2 format with DICOM originals included is preferred. JSON sidecars must encode PE acuity (acute vs. chronic), Wells score, D-dimer value, and outcome (30-day mortality, need for thrombolysis) where available. The algorithm will be validated for integration into emergency radiology AI triage workflows and submitted for regulatory clearance. A formal data-use agreement and institutional ethics approval are prerequisites before data transfer commences.

Medical imagingCTDICOMJSONNIfTI-2
0 / 800 scans0%

5,000 chest CT scans with COVID-19 and viral pneumonia ground-glass opacity segmentation for AI triage research

Open

This request seeks a large, diverse, multi-site chest CT dataset to support research into AI-assisted diagnosis and severity scoring of COVID-19 pneumonia and other viral lower respiratory tract infections. The hallmark finding of interest is ground-glass opacity (GGO), typically manifesting as hazy areas of increased attenuation that do not obscure underlying bronchovascular structures, with Hounsfield unit range approximately –600 to –200 HU in infected regions compared to normal lung parenchyma near –850 HU. Consolidation, crazy-paving pattern, and subpleural distribution are additional features to be captured. Imaging requirements: axial thin-section CT (slice thickness 0.625–1.5 mm), lung window reconstruction (WL –600 HU, WW 1500 HU), both non-contrast and low-dose protocols accepted. Each volume must carry at least one of the following annotation tiers: (a) lobe-level GGO and consolidation segmentation masks in NIfTI format, (b) whole-lung segmentation mask, (c) per-scan severity score (CT severity index 0–25 or equivalent), or (d) RT-PCR confirmed diagnosis label (COVID-19 positive, influenza, other viral, bacterial, non-infectious). We strongly prefer scans with all four annotation tiers but will accept partial annotation with appropriate metadata flags. De-identification must comply with GDPR Recital 26 for European institutions and HIPAA Safe Harbor for US contributors. Longitudinal series from the same patient (admission, day-5, discharge) are highly valuable and should be pseudonymized with a consistent patient key so temporal progression can be modeled. JSON sidecars should include acquisition date relative to symptom onset, vaccination status if available, ICU admission flag, and oxygen saturation at time of scan. Volumetric DICOM series delivered as complete studies with all reconstructed series (lung kernel, soft-tissue kernel) are preferred; NIfTI-converted volumes are also acceptable. Tube voltage is typically 100–120 kVp for standard chest CT; low-dose screening protocols at 80 kVp are acceptable provided noise characteristics are documented. Scanner diversity is essential: contributions from GE, Siemens, Philips, and Canon sites are all welcome, and geographic diversity spanning Europe, North America, and Asia is prioritized to capture population-level variation in disease presentation. Annotation inter-rater agreement for GGO percentage (intraclass correlation coefficient ≥ 0.85) must be reported. QA exclusion criteria include scans with greater than 20% motion-corrupted slices, incomplete lung coverage, or absence of confirmed microbiological diagnosis. The resulting model will support real-time triage scoring integrated into PACS worklist systems, enabling prioritization of deteriorating patients in high-volume pandemic or endemic disease scenarios. Data will not be used for any commercial purpose beyond the stated AI research scope, and results will be published with appropriate attribution to contributing institutions.

Medical imagingCTDICOMJSONNIfTI
0 / 5000 scans0%

1,200 contrast-enhanced abdominal CT volumes for liver lesion segmentation and RECIST measurement

Open

We are constructing a benchmark dataset for automated liver lesion detection and volumetric segmentation on contrast-enhanced CT of the abdomen, targeting hepatocellular carcinoma (HCC), colorectal liver metastases (CRLM), and benign focal liver lesions (hemangioma, cysts, FNH). Imaging protocol must include arterial-phase and portal-venous-phase acquisitions, with slice thickness ≤2 mm and pixel spacing ≤0.8 mm in-plane. The Hounsfield unit dynamic range of interest is –200 to +300 HU (liver parenchyma typically 50–70 HU in portal phase; hypervascular HCC peaks at 80–120 HU in arterial phase). Multiplanar reconstruction (MPR) in coronal and sagittal planes is welcome but not mandatory. Annotation requirements are demanding: each lesion must have a 3D segmentation mask generated or confirmed by an abdominal radiologist with at least 5 years of subspecialty experience. Masks should be provided in NIfTI or NIfTI-2 format, with a JSON metadata file encoding lesion type, RECIST 1.1 longest axial diameter in millimeters, lesion number, LI-RADS category for HCC cases, and whether the patient had prior locoregional therapy (TACE, ablation). Liver parenchyma whole-organ masks are strongly encouraged as an additional annotation layer to facilitate liver-volume normalization during training. De-identification must satisfy HIPAA Safe Harbor (Method 1) with removal of all 18 identifiers and re-mapping of DICOM UIDs. Cases with prior abdominal surgery or transplant should be flagged in metadata. We require a minimum of 20% negative cases (no focal lesion) to anchor the model's specificity. Tube voltage should be documented for each acquisition phase, with standard portal-venous-phase protocols at 100–120 kVp using automated tube-current modulation. Contrast agent type (iodinated, concentration in mg/mL), injection rate (mL/s), and delay time (seconds from injection to scan start) must be recorded in the JSON sidecar for each phase, as these parameters directly affect lesion-to-liver contrast and Hounsfield unit values at the time of acquisition. Scanner heterogeneity across multiple vendors (GE, Siemens, Philips) and field-site geographic diversity are required to prevent model overfitting to a single institution's acquisition style. QA exclusion criteria include studies with gross motion artifact, incomplete hepatic coverage, or absence of a portal-venous phase. Inter-rater segmentation agreement (Dice ≥ 0.80 on lesions ≥10 mm) must be documented per contributing site. The dataset will be used to develop a clinical decision-support tool for oncology multidisciplinary tumor boards, enabling automated lesion tracking across treatment cycles in compliance with RECIST 1.1 response assessment criteria. Data will be processed within an ISO 27001–certified cloud environment under a fully executed DUA.

Medical imagingCTJSONNIfTINIfTI-2
0 / 1200 scans0%

3,500 non-contrast head CT scans with intracranial hemorrhage labels and hemorrhage-subtype segmentation

Open

Our neuroradiology AI team is building a real-time triage algorithm for the emergency detection of intracranial hemorrhage (ICH) on non-contrast computed tomography (NCCT) of the brain. Acute blood appears hyperdense on NCCT (50–80 HU), making CT the first-line modality in stroke and trauma settings. We require axial NCCT series acquired at standard emergency-room protocols: tube voltage 120–140 kVp, slice thickness ≤5 mm (preferably 2.5 mm or thinner for posterior-fossa coverage), reconstructed with both brain window (WL 40 HU, WW 80 HU) and subdural window (WL 75 HU, WW 200 HU). Scout localizer images should be excluded from the de-identified package. Each scan must carry one or more of the following hemorrhage subtype labels: epidural hematoma (EDH), subdural hematoma (SDH), subarachnoid hemorrhage (SAH), intraparenchymal hemorrhage (IPH), or intraventricular hemorrhage (IVH). Pixel-level 2D or 3D segmentation masks indicating the hemorrhagic region are required for at least 60% of positive cases; the remainder may carry bounding-box annotations. Negative (no hemorrhage) cases should constitute 35–40% of the total dataset to reflect realistic emergency-room case mix and to enable balanced training. All DICOM files must be de-identified in compliance with GDPR Article 89 and HIPAA Safe Harbor, with UIDs re-mapped using a consistent pseudonymization scheme so longitudinal cases (admission plus follow-up) can be linked internally. Accompanying JSON sidecar files should encode subtype, hemorrhage volume estimate in milliliters, midline shift in millimeters, GCS score where available, and scan acquisition timestamp relative to symptom onset. NIfTI conversion is acceptable in addition to or instead of DICOM. Head CT volumes must undergo defacing or skull-stripping-based defacing prior to delivery to eliminate re-identification risk through facial reconstruction. Scanner diversity across 3T-equivalent protocols from GE Discovery, Siemens SOMATOM, and Philips Brilliance platforms is desirable. Volumetric DICOM series reconstructed at isotropic or near-isotropic resolution (≤2.5 mm) enable multiplanar reformatting for 3D lesion characterization. Annotation inter-rater reliability must achieve a minimum Dice similarity coefficient of 0.75 on hemorrhage masks across independent neuroradiologist reads, with adjudication by a third reader for discordant cases. QA exclusion criteria include scans with severe beam-hardening artifact from dental implants obscuring supratentorial structures, incomplete brain coverage, or imaging performed more than 24 hours after initial ictus without temporal metadata. The trained model will be deployed as a CE-marked and FDA 510(k)-pathway medical device for acute ICH flagging in radiology worklist prioritization systems. No patient data will leave the secure processing environment; a signed data-use agreement will be provided to every contributing institution.

Medical imagingCTDICOMJSONNIfTI
0 / 3500 scans0%