15,000 12-Lead ECGs with STEMI and NSTEMI Labels for Acute MI Detection AI

Open

Overview

We are building a real-time, point-of-care AI system to detect ST-elevation myocardial infarction (STEMI) and non-ST-elevation myocardial infarction (NSTEMI) from the initial 12-lead ECG acquired in the emergency department. Early automated flagging of STEMI is a critical bottleneck in door-to-balloon time, and our system targets integration with existing ECG cart software to produce an alert within 10 seconds of acquisition completion. We require 15,000 12-lead ECG recordings at ≥500 Hz sampling rate, ≥12-bit amplitude resolution, with a minimum recording duration of 10 seconds. Files must be provided in WFDB or EDF format; machine-readable XML exports from GE MUSE or Philips TraceMaster systems accompanied by raw voltage data are also acceptable. The case mix must include: confirmed STEMI (by culprit-artery territory — anterior, inferior, lateral, posterior), confirmed NSTEMI, unstable angina with ECG changes, and normal/benign controls. Confirmed diagnoses must be backed by troponin results and, where available, catheterisation or echocardiography findings referenced in an accompanying clinical summary. Annotation requirements per record: ST-segment elevation or depression measurements (in mV, per lead), territory classification, cardiologist final diagnosis, and Killip class if available. Keypoint annotations for QRS onset, J-point, and ST-segment measurement point at 60–80 ms post-J-point are required for the training of measurement regression heads. The annotation labeling protocol requires a minimum of two independent cardiologist readers per record. Initial annotations are produced by an interventional cardiologist or senior cardiology registrar; a second interventional cardiologist performs blind over-read. In cases of STEMI/NSTEMI disagreement, a third senior cardiologist adjudicates. ST-elevation thresholds must follow the 2018 ESC Fourth Universal Definition of Myocardial Infarction criteria: ≥1 mm elevation in two contiguous limb leads, ≥2 mm in two contiguous precordial leads (≥2.5 mm in men under 40, ≥1.5 mm in women), or new left bundle branch block pattern with hemodynamic compromise. Inter-reader kappa for the STEMI binary label must be ≥0.85 across the contributed subset. De-identification is mandatory: all HIPAA-specified PHI must be removed from DICOM-ECG or MUSE XML headers, including patient ID, name, date of birth, admission date shifted to a relative offset, and institution identifiers. Only age, sex, cardiovascular risk factors (smoking status, hypertension, dyslipidaemia, diabetes, prior MI or PCI), symptom-onset-to-ECG time in minutes, and peak troponin value (categorical: negative, mildly elevated, markedly elevated) may be retained. Data must be de-identified with all protected health information removed. Age, sex, and major cardiovascular risk factors are essential metadata fields. We are particularly interested in recordings obtained within the first two hours of symptom onset, as early MI ECGs are substantially underrepresented in public datasets such as PTB-XL and MIMIC-IV-ECG. Downstream use cases include integration into emergency department triage workflows as a clinical decision support tool, training a multi-label classifier capable of simultaneously identifying STEMI territory, Wellens syndrome, and de Winter T-wave patterns, and serving as a benchmark dataset for regulatory submissions to the FDA under 510(k) substantial equivalence evaluation for AI-based ECG analysis software.

Sensor / device dataECGCardiacCSVEDFWFDB

Progress

0 / 15000 scans0%

Data Specifications

CategorySensor / device data
Required quantity15000
Data typesSensor / device data, ECG, Cardiac, CSV, EDF, WFDB
BudgetEUR 90000.00
Deadline2026-10-30

Use Cases

  • Training and validating Sensor / device data AI/ML models
  • Benchmarking Sensor / device data detection and segmentation algorithms
  • Building de-identified Sensor / device data research datasets for academic studies
  • Augmenting existing Sensor / device data datasets to reduce class imbalance