MIRACOLIX Tools

The technical components of the MIRACUM data integration centres are defined as parts of a modular architecture and may interact with each other and interchange data based on ETL processes (Extraction, Trans-formation, Loading) and standardized application programming interfaces (REST service interface).

The major components of the DIC architecture are built upon a Medical Informatics ReusAble eCO-system of open source Linkable and Interoperable software tools (MIRACOLIX).

MIRACOLIX is composed of a large set of scalable and interoperable software tools, which will stepwise be designed, developed, refined, deployed and implemented by the MIRACUM partners. Thus the DIC architecture will, in the development and funding phase, grow in five (yearly) phases and be continuously enhanced. According to their previous experiences and competences different software tools will be provided, adapted for their use within a MIRACUM DIC, deployed and supported by different MIRACUM competence centres located at each of the MIRACUM partner sites.

For this ecosystem, we aim at reusing as many open source software tools (which have already been successfully applied in other international research projects) as possible.

MIRACUM start with a basic architecture, which has already been implemented in the conceptual phase to serve as a proof-of-concept architecture being based on the release of MIRACOLIX 0.9. Upcoming new releases (MIRACOLIX 1.0, 2.0, 3.0, 4.0) may constitute functional upgrades of already established architecture components, moving those to a higher level of maturity, but also the introduction of new components into the DIC architecture.

MIRACOLIX 0.9 comprises the following software modules:

Repositories
  • Long-Term Research Data Archive:
    Shall support the long-term storage of data (but also respective analysis software artefacts/queries) which have been applied in a research project. Requires a close integration with the hospital-/faculty-wide trial-/project registry and the project proposal management tool.

    • iRODS: Wellcome Trust Sanger Institute, University College London, Utrecht University
      compare Rajasekar et al 2010, Chiang et al. 2011
    • CKAN (UK Government open data portal (data.gov.uk), University of Lincoln
      compare Winn 2013
  • Data Repositories, Exploration & Visualization:
    Depending on the various scenarios for data integration, cohort identification, data exploration and the types of data which shall be supported different data integration and data exploration repositories will be established:

    • i2b2, IDRT: Informatics for Integrating Biology & the Bedside
      compare Murphy et al. 2010; Kohane et al. 2012; Ganslandt et al. 2011; Bauer et al. 2016
    • tranSMART: tranSMART Foundation,
      compare Scheufele et al. 2014; Canuel et al. 2015; Herzinger et al. 2017; He et al. 2016; Dunn et al. 2016; Christoph et al. 2017; Knell et al. 2017
    • OMOP DB (OHDSI):
      compare Hripcsak et al. 2015; Hripcsak et al. 2016
    • XNAT:
      compare Marcus et al. 2007
    • cBioportal:
      compare Gao et al.; Signal. 2013; Cerami et al. 2012
Project & Trial Management
  • Hospital-/Faculty-wide Trial-/Project Registry
    • Shall provide the single source of information about all research projects and clinical trials established at a medical faculty/university hospital
    • Shall comprise all the relevant metadata of such research projects/trials, including e.g. the trial´s eligibility criteria, which shall be further applied in the IT supported patient recruitment tool
    • Will be newly developed within the MIRACUM project based on prototypes existing in Erlangen, Frankfurt, and Freiburg
      compare e.g. Trinczek et al. 2014; Schreiweis et al. 2014
  • Project Proposal Management
    • Preliminary version based on the Atlassian Confluence collaboration platform
    • Shall support the management of all incoming project proposals aiming at the use of predefined data sets from the DIC; shall support workflow mechanisms for the UAC to approve or reject project proposals; needs a close integration with the Hospital-/Faculty-wide Trial-/Project Registry
Data Protection
  • ID-Management Tool
    • Proprietory development of FAU Erlangen-Nürnberg; shall be later migrated to the Mainzelliste
  • Data Anonymization Tool
    • Supports the anonymization of datasets by application of statistical disclosure methods adapted to the specific requirements of each use case; e.g. ARX: http://arx.deidentifier.org/
      compare Prasser et al. 2016abc; Kohlmayer et al. 2015; Prasser et al. 2014
  • Consent Management Tool
    • Shall support the electronic management and provision of graduated levels of patient consent in patient care as well as different research contexts
    • Shall be based on gICS (generic informed consent service),  which was developed and applied within the MOSAIC/MAGIC projects
      compare Bialke et al. 2015
IT-Infrastructure
  • Federated Authentication
    • Shall allow users to authenticate against central as well as local components. Accounts issued in home institutions can be used via DFN-AAI; e.g. OAUTH
      compare Choi et al 2016b; Rieger 2009
    • Auth (MAGIC project)
  • Tools for Development, Deployment and Monitoring of IT Solutions
    • Continuous Integration Test-Pipeline
    • Shall support the development and test of dedicated MIRACOLIX software solutions at one MIRACUM site and its integration test into the general DIC architecture as well as the deployment to all other MIRACUM sites and continuous monitoring in operation
  • Sharing and Deployment of Software Pipelines for Analysis of OMICs Data
    • Shall allow for sharing bioinformatics pipelines and enables the MIRACUM sites to learn from one another
    • Various already established pipelines can be run and evaluated at all sites to compare the different results and to build the ground for the definition of quality parameters and requirements
    • Deployable analysis infrastructure: Galaxy, Galaxy Tool Shed
      compare Cock et al. 2013; Lazarus et al. 2012
    • Docker: Sharing of bioinformatics pipelines
      compare Leprevost et al. 2017; O´Connor et al. 2017
Data Integration Tools
  • ETL Tools
    • Preliminary versions based e.g. on the IDRT project
    • Shall support the definition and application of automated pipelines for the extraction, transformation and loading of data in data integration repositories
    • Open Source Tool Talend OpenStudio
      compare Bauer et al. 2016
    • Additionally IHE and FHIR based ETL-processes depending on the local MIRACUM site environment
  • Connector Components
    • Preliminary version of i2b2 already part of MIRACOLIX 0.9
    • Shall provide the functionalities required to integrate a local DIC data integration and data exploration repository into a federated network
    • Shall be based on i2b2, Samply Broker/Connector (German Lung Research Network (DZL), German Cancer Research Network (DKTK), ADOPT BBMRI ERIC, GBN; GBA)
    • Tools have been prototypically tested in the DZL, DKTK and GBN projects and will be further extended in GBA and ADOPT
      compare Mate et al. 2017a, 2017b
  • Natural Language Processing Tool
    • Shall support the analysis of freetext information (typically provided in physician discharge letters, clinical notes, radiology/pathology reports and other documents of the EHR) and the annotation of such narrative text documents with structured data elements, defined within the M-MDR
    • Shall be based on the Averbis Information Discovery tool
      compare e.g. Seuss et al. 2017; López-Garcia et al. 2016; Christoph et al. 2015; Kreuzthaler et al. 2015; Schulz et al. 2013, 2011
  • Data Harmonisation/Mapping Tool
    • Its functionality is closely linked to the MDR
    • It is supposed to support the mapping processes between different local data elements and their respective values (value lists) and the centrally defined core data set
    • Its development shall be based on MOLGENIS (BiobankConnect) and the Erlangen Ontology-based Mapping Tools
      compare Pang et al. 2015a; Pang et al. 2015b; Pang et al. 2016; Swertz et al. 2010; Mate et al. 2011; Mate et al. 2015
  • MIRACUM Metadata Repository