摘要
Efforts to improve care for people living with sickle cell disease (SCD) have led to the development of several registries; however, many are dependent on time-limited funding and lack coordination. Consequently, existing data sets are fragmented and do not provide the comprehensive, longitudinal insights achievable through well-integrated registries. Although relevant data exist within electronic medical records across institutions, aggregation is limited by poor interoperability, inconsistent use of common data elements, and poor translation of natural language into codified data. These barriers hinder population-level research and contribute to gaps in understanding the lifelong progression of SCD. Creating a new common data system risk losing years of valuable data, highlighting the need to optimize existing data resources. This study aimed to develop a privacy-preserving method to securely link 3 of the largest SCD data collection efforts in the United States. Conducted at the University of Alabama at Birmingham Lifespan Sickle Cell Center, the study leveraged institutional review board-approved access to the Sickle Cell Data Collection project, the American Society of Hematology Research Collaborative Data Hub, and the Globin Research Network for Data and Discovery. Identity tokens were generated and hashed using Secure Hash Algorithm (SHA)-256 to enable secure linkage without sharing protected health information. A total of 8026 records were identified across the 3 registries. Deterministic matching of hashed tokens identified 1080 unique individuals appearing in at least 2 data sets. This study demonstrates the first privacy-preserving linkage of multiple SCD registries. Secure data integration enhances interoperability and enables richer longitudinal analyses critical for advancing SCD research and treatment development.