50,000 Single-Lead Wearable ECG Strips for Large-Scale Atrial Fibrillation Population Screening

Open

Overview

Consumer and clinical-grade wearable devices — smartwatches, chest patches, and handheld recorders — are increasingly used for opportunistic AF screening in primary care and community settings. However, models trained on clinical 12-lead ECGs perform poorly on single-lead data because of electrode placement variability, motion artefact, and the absence of spatial voltage information. We are developing a dedicated single-lead AF detection model targeting deployment in FDA Class II-cleared wearable devices. We require 50,000 single-lead ECG recordings, each equivalent to Lead I or a modified limb-lead configuration, with recording durations of 30 seconds to 5 minutes per strip. Minimum sampling rate is 200 Hz; 256 Hz or 300 Hz (typical of consumer optical-to-electrical biosignal chips) is preferred. Amplitude resolution of ≥8-bit is the floor; 12-bit is preferred. Preferred formats are CSV (column-per-channel with ISO 8601 timestamp) or JSON with signal array, sample-rate field, and metadata object. Data may originate from any cleared handheld or wrist-worn single-lead recorder (AliveCor KardiaMobile, Withings ScanWatch, Zio patch, or equivalent clinical Holter export truncated to single channel). Each strip must carry a rhythm label: AF confirmed, AF not present, technically inadequate or excessive artefact. Labels must be generated by a certified cardiac physiologist or electrophysiologist, not by the device own algorithm, to avoid label noise from the very systems our model aims to replace. The labeling protocol requires human expert review using a validated browser-based or desktop annotation platform displaying the raw waveform; annotators must be blinded to the device automatic interpretation. A minimum of 5% of all strips must undergo dual independent annotation for inter-rater reliability assessment; Cohen's kappa for the AF-confirmed versus AF-not-present binary decision must be ≥0.80. Strips flagged as technically inadequate must also be reviewed by a second annotator before final labeling, as false inadequacy labeling artificially inflates the rejection rate and degrades training signal. Strips with significant baseline wander, muscle artefact, or lead-off events are valuable as hard negatives and should be labelled as technically inadequate rather than discarded. Because wearable recordings are inherently susceptible to high-frequency motion noise during physical activity, recordings captured during walking, stair climbing, or light exercise (documented by device accelerometer data if available) are specifically solicited to build robustness at inference time. QRS morphology characteristics such as irregular RR intervals, absent P-waves, fibrillatory baseline, and variable QRS amplitude — the hallmarks of AF in single-lead traces — should be used as secondary annotation cues and documented in per-strip quality notes. De-identification must comply with HIPAA Safe Harbour or GDPR Article 89 pseudonymisation. All device-embedded PHI (patient name, date of birth, device serial number traceable to a named individual) must be removed or replaced with surrogate identifiers before delivery. Recordings must not include GPS coordinates or location data, even in embedded metadata fields. Subject-level metadata should include age, sex, BMI, and known AF history (paroxysmal, persistent, permanent, or no known AF), as these features will be used as auxiliary inputs to the model. Atrial fibrillation subtype (paroxysmal versus persistent versus permanent) must be documented where known, as paroxysmal AF episodes captured mid-episode represent the highest clinical value and are the most challenging to detect. All data must be de-identified per HIPAA or GDPR standards. We anticipate an AF prevalence of 15–25% in the supplied dataset, reflecting a screening-enriched population rather than a general community sample. Downstream use cases include a consumer AF detection app embedded in a cleared smartwatch, a primary-care nurse-administered screening kiosk, population-level epidemiological AF prevalence tracking via wearable aggregation, and federated model training across wearable device manufacturer partnerships without centralising raw patient data.

Sensor / device dataECGCardiacCSVJSON

Progress

0 / 50000 scans0%

Data Specifications

CategorySensor / device data
Required quantity50000
Data typesSensor / device data, ECG, Cardiac, CSV, JSON
BudgetUSD 75000.00
Deadline2027-03-29

Use Cases

  • Training and validating Sensor / device data AI/ML models
  • Benchmarking Sensor / device data detection and segmentation algorithms
  • Building de-identified Sensor / device data research datasets for academic studies
  • Augmenting existing Sensor / device data datasets to reduce class imbalance