Echocardiogram Datasets — Cardiac Ultrasound (ECHO) Data

Echocardiogram datasets, also called echo or cardiac ultrasound, capture real-time imaging of the heart's structure and function using high-frequency sound waves. They are essential for developing models that assess cardiac anatomy, quantify function, and detect disease without ionizing radiation. Echo data spans several standard views and modes: transthoracic echocardiography (TTE) and transesophageal echocardiography (TEE); two-dimensional B-mode cine loops; M-mode tracings; color, continuous-wave, and pulsed-wave Doppler; tissue Doppler imaging; and speckle-tracking strain.

Canonical acquisition windows include the parasternal long-axis and short-axis, apical two-, three-, four-, and five-chamber views, and subcostal and suprasternal views. Files are typically stored as DICOM cine loops with embedded frame rate, probe, and ECG-gating metadata, and may include raw or compressed video. Clinically valuable echocardiogram datasets carry expert annotations and measurements: left-ventricular ejection fraction (LVEF), end-diastolic and end-systolic volumes, global longitudinal strain, wall-motion scoring, chamber dimensions, valve area and gradients, regurgitation and stenosis grading, diastolic-function parameters, and estimated pulmonary pressures.

Label schemas frequently address heart failure with reduced or preserved ejection fraction, valvular disease, cardiomyopathies, pericardial effusion, and congenital abnormalities. Because echo is highly operator-dependent, robust datasets document acquisition quality, view labels, and frame-level segmentation of the endocardial and epicardial borders, which is critical for automated ejection-fraction estimation and view classification. High-quality cohorts are demographically diverse, de-identified to strip patient identifiers from DICOM headers and burned-in pixel text, and quality-scored for image clarity and completeness.

On GetDATA, clients post precise echocardiography requests, specifying views, modes, measurements, label taxonomy, and minimum case counts, and verified providers fulfill them with compliant cardiac ultrasound data. Increasingly, datasets also support automated pipelines for view classification, beat segmentation, and Doppler-trace digitization, and they may include paired electronic-health-record variables such as NT-proBNP, troponin, and follow-up outcomes that enable multimodal and prognostic modeling beyond single-study interpretation. Browse the open echocardiogram requests below, or explore related cardiac and cross-sectional imaging categories.

Open Echocardiogram requests

10,000 echocardiogram cine loops with view labels and chamber segmentation masks for foundation model pre-training

Open

We are pre-training a large cardiac ultrasound foundation model intended to serve as a general-purpose feature extractor for a broad range of downstream echocardiography AI tasks, including LVEF regression, valvular disease severity grading, diastolic function classification, and structural congenital anomaly detection. Diversity of acquisition views, patient demographics, scanner vendors, image quality levels, and disease states is the primary data requirement; this dataset is explicitly designed to span the full real-world distribution of clinical echo data rather than a curated high-quality subset, ensuring robust representation learning across the full spectrum of clinical practice. Required acquisitions cover all standard transthoracic echocardiography views: apical 4-chamber (A4C), apical 2-chamber (A2C), and apical 3-chamber (A3C); parasternal long-axis (PLAX); parasternal short-axis at aortic valve level (PSAX-AV), mitral valve level (PSAX-MV), and papillary muscle level (PSAX-PM); subcostal 4-chamber (SC4C) and subcostal inferior vena cava (SCIVS); and suprasternal notch (SSN). B-mode cine loops are the primary modality. Color Doppler overlays, pulsed-wave Doppler spectral tracings, continuous-wave Doppler recordings, and M-mode sweeps through the left ventricle and mitral valve are also accepted and must be labelled by modality. Cine loop duration may range from one to ten cardiac cycles; single-frame still images without temporal context are excluded from this request. DICOM format is mandatory to preserve acquisition metadata embedded in standard tags — imaging depth, transducer frequency, mechanical index, scanner manufacturer and model, gain and time-gain compensation settings — all of which will serve as auxiliary conditioning inputs during foundation model pre-training. De-identification must comply with the DICOM PS3.15 Annex E Basic Application Level Confidentiality Profile, with explicit removal of patient name, date of birth, institution name, referring physician, device serial number, and any burned-in annotation text overlaying pixel data. A per-batch de-identification certificate confirming the method and software version used is required. GDPR-compliant pseudonymisation is acceptable for European institutions in lieu of full anonymisation, provided a subject pseudonym key is retained securely by the contributing institution and never transferred. Annotation requirements are intentionally lightweight to enable dataset scale: each cine loop requires only a view-classification label drawn from the controlled vocabulary (A4C, A2C, A3C, PLAX, PSAX-AV, PSAX-MV, PSAX-PM, SC4C, SCIVS, SSN, or Other) and a sonographer-assigned image quality score on a three-point scale (1 poor, 2 adequate, 3 good). For a 20% random stratified subsample of 2,000 studies, pixel-level segmentation masks of the left ventricle endocardium and epicardium, right ventricle endocardium, and left atrial endocardium at end-diastole are required to enable supervised fine-tuning experiments in parallel with self-supervised pre-training. Anatomical keypoints — medial and lateral mitral annular hinge points and the LV apex — are requested for the same 2,000-study subsample to facilitate alignment-based data augmentation. Scanner vendor diversity targets: minimum 2,000 studies each from GE, Philips, Siemens, and Canon platforms, with remaining studies from other vendors or mixed sources.

Medical imagingUltrasoundDICOMJSONPNG / JPG
0 / 10000 scans0%

1,800 stress echocardiography paired studies (rest and peak stress) for ischemia classification

Open

Stress echocardiography remains a cornerstone of non-invasive ischemia assessment, yet visual wall-motion scoring is highly operator-dependent and shows significant inter-reader variability even among experienced cardiologists. We are developing an automated wall-motion abnormality detection system trained on paired rest-and-peak-stress cine-loop acquisitions, targeting sensitivity and specificity benchmarks comparable to Level III expert readers for detecting hemodynamically significant coronary artery disease across all three major coronary territories. Each study pair must include resting and peak-stress cine-loop acquisitions — obtained via exercise treadmill, upright cycle ergometer, or pharmacological dobutamine infusion protocol with or without atropine augmentation — in at least four standard views: apical 4-chamber, apical 2-chamber, parasternal long-axis, and parasternal short-axis at the mid-papillary muscle level. Apical 3-chamber views are requested where acquired. Side-by-side quad-screen DICOM files in the standard stress-echo cine display format — rest and stress loops displayed simultaneously at matched cardiac cycles — are acceptable and preferred, as they reflect the real-world reporting workflow and facilitate direct comparison learning. Frame rates must be sufficient to resolve individual cardiac phases at elevated heart rates, requiring a minimum of 50 fps at peak stress and 25 fps at rest. Second harmonic B-mode imaging is required; studies acquired with ultrasound contrast agent (UCA) — specifically SonoVue/Lumason or Definity/Luminity — are explicitly welcomed alongside non-contrast acquisitions and must be flagged in the metadata with contrast agent name and dose administered. Doppler tissue imaging (DTI) of the mitral annulus at rest is requested as a supplementary acquisition where available, providing diastolic functional context alongside ischemia assessment. Mandatory structured clinical labels include: stress protocol type, peak heart rate achieved, percentage of age-predicted maximum heart rate, Duke Treadmill Score where applicable, rate-pressure product at peak stress, wall-motion score index (WMSI) at rest and at peak stress per the ASE 17-segment left ventricular model, and the overall study conclusion classified as normal, inducible ischemia, fixed scar, or non-diagnostic. Per-segment wall-motion labels at rest and peak stress — normokinesis, hypokinesis, akinesis, or dyskinesis — are required for all 1,800 study pairs, structured as JSON arrays indexed to ASE segment numbering. Invasive coronary angiography correlation data, including percentage stenosis per vessel and Syntax score where available within 12 months of the stress study, should be linked pseudonymously to the imaging data to provide ground-truth coronary anatomy labels for ischemia territory mapping. QA exclusion criteria: studies with suboptimal image quality precluding wall-motion assessment in more than two of the 17 segments at peak stress must be excluded or clearly flagged as non-diagnostic to prevent label noise during model training.

Medical imagingUltrasoundDICOMJSON
0 / 1800 scans0%

4,000 pediatric and congenital heart disease echocardiogram studies for structural anomaly detection

Open

Congenital heart disease (CHD) affects approximately 1% of live births and represents one of the most diagnostically challenging domains in clinical ultrasound, where accurate and timely diagnosis is critical for surgical planning and outcomes. We are constructing a multi-label classification model capable of identifying common structural anomalies — ventricular septal defect (VSD), atrial septal defect (ASD), tetralogy of Fallot, transposition of the great arteries, hypoplastic left heart syndrome, and coarctation of the aorta — directly from neonatal and pediatric TTE cine loops without requiring manual feature extraction. Required acquisitions span the complete standard pediatric echo protocol: subcostal 4-chamber and short-axis views for septal integrity assessment, apical 4-chamber view, parasternal long-axis and short-axis views at multiple levels including the great vessels, mitral valve annulus, and papillary muscles, and suprasternal notch views for aortic arch and ductal anatomy assessment. Color Doppler overlays demonstrating shunt flow direction and velocity, outflow tract obstruction, or valvular regurgitation jets are strongly requested for each structurally abnormal study and are mandatory for VSD, ASD, and outflow tract lesions. Frame rates of at least 30 fps are required; neonatal studies at higher frame rates of 60–80 fps are welcomed and preferred. All cine loops must be delivered as DICOM files with the original pixel data fully intact and without lossy recompression. Clinical labels must include the confirmed CHD diagnosis or a normal label, age at acquisition in months rather than exact date of birth to protect identity, biological sex, and body weight category coded as neonate under 1 month, infant 1–12 months, child 1–12 years, or adolescent 13–18 years. Segmentation masks of all four cardiac chambers and great vessel origins — aortic root, pulmonary trunk — are required for at least 1,000 studies to support anatomical landmark learning and chamber volumetry in small hearts. Studies should be accompanied by structured echocardiographic report summaries in plain text where available, with all personal identifying information stripped prior to transfer. Doppler-derived hemodynamic measurements including peak VSD jet velocity, estimated right ventricular pressure, and pulmonary-to-systemic flow ratio (Qp:Qs) should be included as structured JSON labels where clinically measured. Inclusion of longitudinal follow-up studies from the same patient, pseudonymously linked by a consistent de-identified subject ID, is highly valuable for disease progression and post-operative remodelling research. Studies from multiple institutions across diverse geographic regions are preferred to capture variation in patient ethnicity, altitude-related physiology, and institutional scanning protocols. De-identification must comply with DICOM PS3.15 Annex E and must include removal of all burned-in annotation text overlaying pixel data.

Medical imagingUltrasoundDICOMJSON
0 / 4000 scans0%

2,500 speckle-tracking echocardiography studies with global longitudinal strain values

Open

Global longitudinal strain (GLS) derived from speckle-tracking echocardiography (STE) is an emerging biomarker for subclinical left ventricular dysfunction, cardiotoxicity monitoring in oncology patients, and early cardiomyopathy detection before overt systolic impairment develops. We are building a regression model that predicts GLS directly from standard B-mode apical cine loops, eliminating the dependency on proprietary vendor speckle-tracking software and enabling GLS estimation at sites without dedicated post-processing workstations. We require high-quality B-mode cine-loop acquisitions from the apical 4-chamber, apical 2-chamber, and apical 3-chamber (apical long-axis, A3C) views, captured at a frame rate of at least 50–80 frames per second to ensure adequate speckle coherence and tracking stability across frames throughout the cardiac cycle. Spatial resolution should be 600×800 pixels or higher. Each study should provide at least five consecutive cardiac cycles free from respiratory motion artefact and with consistent probe position. DICOM files with uncompressed or losslessly compressed pixel data are mandatory; lossy JPEG compression must not be applied, as it irreversibly degrades the high-frequency speckle patterns that are critical for accurate myocardial tracking and strain computation. The mandatory label for each study is the GLS value expressed as a negative percentage (for example −18.5%) computed by the acquiring institution using their validated STE software platform — EchoPAC, TOMTEC 2D Cardiac Performance Analysis, or an equivalent vendor-validated tool — with the software name and version number recorded in the accompanying JSON metadata sidecar file. Segmental longitudinal strain values for all 18 ASE myocardial segments are requested where available to support regional dysfunction mapping. Segmentation masks of the myocardial wall delineating both the endocardial and epicardial borders at end-diastole are requested for a minimum of 500 studies to support geometric normalisation and wall-thickness estimation experiments. Oncology patients undergoing anthracycline chemotherapy or trastuzumab (Herceptin) therapy represent a particularly valuable subpopulation for cardiotoxicity surveillance applications; institutions are encouraged to flag such cases with a treatment-context label — drug class, cumulative dose, and number of cycles completed — while fully preserving patient anonymity in compliance with HIPAA and GDPR requirements. Baseline and follow-up studies from the same pseudonymised patient are highly sought. Demographic balance across age decades (30–49, 50–69, 70+), biological sex, and underlying cardiomyopathy aetiology (ischaemic, dilated, hypertrophic, normal) should be targeted to ensure model generalisability.

Medical imagingUltrasoundDICOMJSON
0 / 2500 scans0%

3,000 Doppler echocardiography studies for aortic stenosis and mitral regurgitation grading

Open

This data request supports the training and external validation of a multimodal classification network designed to grade the severity of left-sided valvular heart disease — specifically aortic stenosis (AS) and mitral regurgitation (MR) — directly from raw echocardiographic cine loops combined with spectral and color Doppler frames. Accurate automated grading would reduce inter-reader variability and accelerate triage in high-volume cardiology laboratories. Required acquisitions include parasternal long-axis (PLAX) and parasternal short-axis (PSAX) cine loops at the level of the aortic valve, continuous-wave (CW) Doppler tracings across the aortic valve, and color Doppler overlays of the mitral valve from the apical 4-chamber view. Pulsed-wave (PW) Doppler recordings at the left ventricular outflow tract (LVOT) are also required to enable computation of the dimensionless velocity index and aortic valve area by the continuity equation. Frame rate for B-mode cine loops should be at least 30 fps; Doppler sweeps should capture a minimum of three consecutive cardiac cycles at a standard sweep speed of 100 mm/s. DICOM format is mandatory for all modalities so that embedded Doppler velocity scale metadata, depth setting, and Nyquist limit can be extracted programmatically during preprocessing. Each study must be accompanied by a clinical label indicating AS severity categorized as none, mild, moderate, or severe per the 2014 AHA/ACC guideline criteria — specifically mean gradient, peak velocity, and aortic valve area — and MR grade categorized as none, mild, moderate, or severe per effective regurgitant orifice area (EROA) quantification or qualitative color jet area assessment. Studies with concurrent moderate-to-severe tricuspid regurgitation should be flagged to support multi-label classification experiments. Pulmonary artery systolic pressure estimated from peak tricuspid regurgitation velocity is requested as a supplementary hemodynamic label. Hospitals are encouraged to include studies spanning the full severity spectrum; a roughly balanced distribution across severity grades is preferred, with a minimum of 150 studies per severity class per valve disease. Inclusion of serial studies from patients following transcatheter aortic valve replacement (TAVR) or surgical mitral valve repair provides longitudinal value and should be pseudonymously linked. Full de-identification compliant with DICOM PS3.15 Annex E Basic Application Level Confidentiality Profile, including removal of institution name, referring physician, and device serial number tags, is required before any transfer.

Medical imagingUltrasoundDICOMJSON
0 / 3000 scans0%

5,000 transthoracic echocardiogram studies with LVEF measurements for heart-failure screening AI

Open

We are developing a deep-learning model to automate left ventricular ejection fraction (LVEF) estimation from transthoracic echocardiography (TTE) studies acquired in routine clinical care. The primary intended use case is population-level heart-failure screening integrated into existing cardiology workflows, reducing the reporting burden on sonographers and cardiologists while enabling earlier intervention in at-risk patients. We require full cine-loop acquisitions from the apical 4-chamber (A4C) and apical 2-chamber (A2C) views, recorded at a minimum of 25 frames per second with spatial resolution no lower than 224×224 pixels. Each study should include at least three complete cardiac cycles per view. Harmonic B-mode imaging is preferred over fundamental mode to improve endocardial border delineation. Data must be delivered in DICOM format with all protected health information removed or replaced per HIPAA Safe Harbor or equivalent GDPR pseudonymisation protocols, including stripping of DICOM tags 0010,0010 through 0010,0040 and any burned-in annotation text. A de-identification manifest confirming the specific method applied is required alongside each batch delivery. Clinical labels must include a cardiologist-verified LVEF value measured by the biplane Simpson's method of discs, New York Heart Association (NYHA) functional class where available, and a binary heart-failure diagnosis flag. Segmentation masks delineating the left ventricular endocardial border at end-diastole and end-systole in the A4C view are strongly preferred for the full dataset and mandatory for at least 30% of studies. Additional chamber measurements — left ventricular end-diastolic volume (LVEDV), end-systolic volume (LVESV), left atrial volume index, and diastolic function grade per ASE 2016 guidelines — are welcomed as supplementary labels to broaden the model's downstream applicability. Patient demographic metadata including age decade, biological sex, body mass index category, and primary diagnosis (heart failure with reduced ejection fraction HFrEF, heart failure with preserved ejection fraction HFpEF, or no heart failure) should be retained in anonymised DICOM tags or an accompanying JSON sidecar. Studies from multiple scanner vendors — GE Vivid, Philips EPIQ, Siemens Acuson, and Canon Aplio — are explicitly sought to ensure device-agnostic model generalisation. Quality control: studies with greater than 20% of frames degraded by ultrasound dropout, rib shadow, or patient motion artefact should be excluded or flagged with a quality score below threshold. A sonographer-assigned image quality rating (1 poor, 2 adequate, 3 good) is requested for each cine loop.

Medical imagingUltrasoundDICOMJSON
0 / 5000 scans0%

Related categories