The technical components of the MIRACUM data integration centres are defined as parts of a modular architecture and may interact with each other and interchange data based on ETL processes (Extraction, Trans-formation, Loading) and standardized application programming interfaces (REST service interface).
The major components of the DIC architecture are built upon Medical Informatics ReusAble eCO-system of open source Linkable and Interoperable software tools (MIRACOLIX).
MIRACOLIX is composed of a large set of scalable and interoperable software tools, which will stepwise be designed, developed, refined, deployed and implemented by the MIRACUM partners. It is of great importance to us to reuse as many open source software tools as possible (which have already been successfully used in other international research projects).
Currently, MIRACOLIX consists of the following software modules (03/2022):
Data Repositories, Exploration and Visualization
Informatics for Integrating Biology and the Bedside (i2b2) is a project sponsored in the USA by the National Institutes of Health (NIH), which has been run as the National Center for Biomedical Computing (NCBC) since 2004 as part of the NIH Roadmap for Medical Research. i2b2 is an extensible open source tool suite that is now successfully used in numerous international research networks, either as a single implementation for hospital-based data integration or as a node in a research network.
Within the MIRACUM DIC, i2b2 is provided as a platform for feasibility studies, cohort identification and support in patient recruitment.
TranSMART is a translational research platform. As a database with integrated analysis tools, it can be used to analyze both clinical and genomic data. It is an open source data warehouse that stores large amounts of data so that it can be shared for translational research. The platform offers, among other things, more than 30 predefined workflows for the analysis and graphical representation of study data and is convincing in its application due to its graphical user interface.
For more information on tranSMART:
- MIRACUM tranSMART Website [in German]
The open source tool cBioPortal is another translational research platform for interactive exploration of multidimensional cancer genomics datasets. Developed by Memorial Sloan Kettering Cancer Centers (MSKCC), cBioPortal provides a web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal’s intuitive web interface provides access to complex genomic profiles without the need for bioinformatics skills.
OHDSI is a global open-science community with the aim of transferring available observation data (e.g. diagnoses, procedures, laboratory values, measurements, medication) from a patient treatment into a standardized format and thus being able to use them internationally for research. The core component of the OHDSI software framework is the OMOP Common Data Model, which serves as the data basis. Data is stored in a standardized format and vocabulary such as SNOMED, LOINC or RxNorm. The OHDSI Community provides a large portfolio of open source components for characterizing cohorts, but also for evaluation, analysis and prediction.
In everyday medical practice, diverse imaging procedures are producing ever-increasing numbers of different medical images, volume data sets and image sequences at a wide variety of locations. In order to be able to answer well-founded questions for patient care or research background on the basis of the images, the demands on the quality of the data and often the need for examples for comparisons are very high. Since most image data is not only extremely heterogeneous, but also very distributed in terms of its storage location, experiments at the University Hospital often cannot benefit from the data treasure of other locations.
The XNAT project focuses on enabling scientists to archive and exchange image and video (meta-)data. At the same time, tools for automatic quality control as well as for the preparation of the data are provided to ensure centralized analysis in connection with the other systems and repositories.
Thus, research questions can now be answered holistically and in the best possible way, taking into account all existing (patient) data.
More information on XNAT:
MIRACUM Mapper & LabVisualizer
With the MIRACUM Mapper, the DIC provides a universal tool for manually mapping non-standard hospital data to standard terminologies. This is important for making data comparable in the context of research. The tool can be widely configured for different validation workflows, allowing comprehensive quality assurance measures to be implemented.
Since 2019, the DIC has been working with the laboratories on mapping the approximately 10,000 Swisslab laboratory parameters to the LOINC standard. For this purpose, the tool with the MIRACUM-LabVisualizer was extended by a useful visualization function (MIRACUM LabVisualizer) for the local laboratory data, which facilitates the assignment to the correct LOINC codes.
The Data Quality Analysis-Tool (DQA-Tool) is a web application that is currently being developed within the MIRACUM project. It is used to check the data quality of a research data repository with a focus on the so-called “Extract-Transform-Load” (ETL) processes with which these research databases are filled. For this purpose, the tool implements current concepts published in the scientific literature (see M.G. Kahn et al.).
The DQA tool is in an experimental stage and is under continuous active development with the goal of providing open source software that can be flexibly applied to a wide range of data sources and thus contribute to an improvement in data quality across many subsystems.
More information on the DQA-Tool:
- Kapsner et al. – Moving Towards an EHR Data Quality Framework: The MIRACUM Approach.
Project and Trial Management
MIRACUM Trial Registry
At the Erlangen site, a central study registry was developed to centrally display all studies taking place at the ten MIRACUM sites via a single website. The registry consists of various modules, including a central FHIR-compatible server as a data store, which is able to accept study information in the form of Fast Healthcare Interoperability Resources (FHIR) via a web interface. Furthermore, there is a “Multisite Merger” which merges the multicenter studies reported from multiple sites so that it is presented as a single study with multiple study sites. These are then displayed on the study registry website. Each local study registry, in order to participate in the central registry, must provide an interface that sends the required study registries to the central study server in the specified format (FHIR).
More information on the MIRACUM Trial Registry:
- MIRACUM Trial Registry Website
- Gulden et al. – Prototypical Clinical Trial Registry Based on Fast Healthcare Interoperability Resources (FHIR): Design and Implementation Study.
- Hasselblatt H et al. – Establishing an Interoperable Clinical Trial Information System Within MIRACUM.
- Sommer M et al. – Design and Implementation of a Single Source Multipurpose Hospital-Wide Clinical Trial Registry.
Project Proposal Management
In order to use clinical data for research purposes from medical departments, hospitals and other medical facilities, research applications must be submitted and checked by various committees before data can be made accessible for researchers.
This application process is still not harmonized. To overcome this, the Medical Informatics Group as part of MIRACUM is currently developing a tool that assists and simplifies the process of project proposal management named ProSkive. The entire application process can be tracked and scientists can easily apply for biomaterial or clinical data via a web interface. Thus, by using innovative methods and technologies, all steps from project proposal to project closure are comprehensible for all parties involved and easily adaptable to different needs. By December 2020, six releases of ProSkive have been distributed to the MIRACUM partner sites.
More information on ProSkive
The E-PIX® (Enterprise Identifier Cross-Referencing) Service allows a precise management of person-identifying data (PII) including linkage of patient’s records in central and federated scenarios. It follows the principles of a master patient index in order to identify and match persons from single or federated study sites. This identity-management includes the correction of synonym errors – when data of one person are stored in at least two independent patient records. The probabilistic record linkage process is performed using demographic information (e.g. first name, surname, birthdate) and/or local identifiers (e.g. insurance or hospital case number). In addition, Privacy Preserving Record Linkage (PPRL) based on bloom filters is supported. E-PIX® is applied at a large number of MII-sites and is provided by the Trusted Third Party of University Medicine Greifswald under open-source license (AGPLv3).
More information on E-PIX®:
The gPAS® (generic Pseudonym AdministrationService) enables a data trustee to generate and manage pseudonyms. As required for various application scenarios the pseudonym generation process is highly configurable and supports a comprehensive customisation (e.g. in terms of utilised algorithms, alphabets, pseudonym composition and length). Additionally, gPAS® can generate multiple hierarchically pseudonyms per person – allowing the use of different pseudonyms for different data sources, target systems, data types or study sites. gPAS® is applied at a large number of MII-sites and is provided by the Trusted Third Party of University Medicine Greifswald under open-source license (AGPLv3).
More information on gPAS®:
Consent Management Tool
The gICS® (generic Informed Consent Service) supports the management of informed consents (IC) and withdrawals. gICS® facilitates all IC-related workflows – from fully electronical to digitalising paper-based capture of the participant’s consent. All consent-related processes are based on policies and re-usable modules allowing automatable checks for real-time consent validity (e.g. concerning the storage of medical information, the collection of bio samples, or re-contact) as well as the check for full or policy-specific withdrawals. gICS® is applied at a large number of MII-sites and is provided by the Trusted Third Party of University Medicine Greifswald under open-source license (AGPLv3).
More information on gICS®:
Keycloak is used as the Federated Authentication Service (FAS) in MIRACUM. It is a widely used, international open source software product that enables access management and also single sign-on for applications. Keycloak uses the OpenID Connect authentication layer, based on the OAuth 2.0 authorization protocol, and the SAML XML framework to exchange authentication information. For the application a user is to authenticate to, Keycloak provides adapters. They are available for various application servers. Alternatively, Keycloak can be addressed in the source code of the application.
More information on Keycloak:
Software Pipeline for Analysis of OMICs Data
In order to ensure and support transparent data integration and decision making for Molecular Tumor Boards (MTBs) across the MIRACUM sites, we have developed the MIRACUM-Pipe.
This is an automated analysis workflow for whole-exome sequencing (WES) and targeted gene panel sequencing (tNGS), which provides reliable, standardized and reproducible results across different facilities. The results are presented in summary form in an interactive PDF report and be used for preparation of a tumor board meeting by the MTB members.
The MIRACUM-Pipe has already been successfully tested twice in the Next-Generation Sequencing “Ringversuch” and therefore received the certificate from the Berufsverband Deutscher Humangenetiker e.V. in 2019 as well as in 2020.
Data Integration Tools
CC-FS - Connector Component Federated Search
As part of the MIRACOLIX tool box, the MIRACUM site Frankfurt develops the components to implement a consortium-wide federated search. This is a connector component for federated searches (CC-FS) on the one hand, and a corresponding querybroker component on the other hand. The querybroker is linked with ProSkive as well as different graphical querybuilders. Currently, OHDSI ATLAS and i2b2 are provided, but thanks to a generic approach, other querybuilders can be connected with reasonable effort. Unlike the centrally deployed querybroker, the CC-FS instances are integrated at the MIRACUM partner sites and are connected to the data management systems on site. Depending on the query schema, CC-FS relays the query to the responsible data management system and gathers the results to make them available in the consortium – given the appropriate consent. The MIRACUM Federated Authentication Service (FAS) is used to authenticate the components.
In clinical research, in addition to medical data that are available in standard terminologies (e.g., ICD10, OPS), data that are not standardized (e.g., laboratory analytes) are also required. To enable interchangeability or joint analysis of these data, these data must be mapped to a standard (e.g. in the case of laboratory data, to LOINC).
However, manual mapping is a very time-consuming task. Matching such codes requires not only fundamental terminological knowledge, but also good knowledge of the conditions at the respective site and requires constant exchange of information with colleagues. Manual maintenance of code lists is not expedient.
For this reason, a semi-automatic approach for the creation of such mappings was developed within the framework of MIRACUM. The MIRACUM Mapper is a generic tool that allows collaborative and asynchronous creation and editing of mappings. In its first application, approximately 10,000 laboratory codes were mapped to the LOINC standard.
NLP Tool Health Discovery
Health Discovery is a text mining and machine learning platform from Averbis GmbH, a German company specialised for knowledge and content technologies in the biomedical field. Health Discovery processes large amounts of clinical narratives, analyses their textual content and produces a structured output consisting of of codes (e.g. ICD-10, TNM) in context (e.g. negation). It is based on the Apache UIMA (Unstructured Information Management Architecture) framework, which allows the addition of custom extraction components, as well as the use of external terminologies. Due to its REST interface, Health Discovery can easily be integrated in existing information architectures. In MIRACUM, Averbis Health supports the data integration centres to acquire coded information from textual sources in clinical information systems.
The MIRACUM consortium requires a central meta data repository (M-MDR) to tackle interoperability challenges and support data harmonization processes within the network. This central M-MDR includes all core data sets defined through the german Medical Informatics Initiative (MII) as well as the consortium and will continuously be extended depending on current use cases. Researchers can utilize these harmonized data sets to define queries for data inquiries. Consequently, the M-MDR represents a major component for federated searches across institutional boundaries. Furthermore, the meta data repository can be used to assist different tasks, like ETL (extract, transform, load) processes or data quality reports. In conclusion, data integration and data exchange across partner sites shall be simplified with the support of the central M-MDR.