The technical components of the MIRACUM data integration centres are defined as parts of a modular architecture and may interact with each other and interchange data based on ETL processes (Extraction, Trans-formation, Loading) and standardized application programming interfaces (REST service interface).

The major components of the DIC architecture are built upon a Medical Informatics ReusAble eCO-system of open source Linkable and Interoperable software tools (MIRACOLIX).

MIRACOLIX is composed of a large set of scalable and interoperable software tools, which will stepwise be designed, developed, refined, deployed and implemented by the MIRACUM partners. Thus the DIC architecture will, in the development and funding phase, grow in five (yearly) phases and be continuously enhanced. According to their previous experiences and competences different software tools will be provided, adapted for their use within a MIRACUM DIC, deployed and supported by different MIRACUM competence centres located at each of the MIRACUM partner sites.

For this ecosystem, we aim at reusing as many open source software tools (which have already been successfully applied in other international research projects) as possible.

MIRACUM start with a basic architecture, which has already been implemented in the conceptual phase to serve as a proof-of-concept architecture being based on the release of MIRACOLIX 0.9. Upcoming new releases (MIRACOLIX 1.0, 2.0, 3.0, 4.0) may constitute functional upgrades of already established architecture components, moving those to a higher level of maturity, but also the introduction of new components into the DIC architecture.

MIRACOLIX 0.9 comprises the following software modules:

  • MIRACUM Metadata Repository (M-MDR); based on the Samply.MDR
  • ID-Management Tool (proprietory development of FAU Erlangen-Nürnberg; shall be later migrated to the Mainzelliste)
  • Project Proposal Management Tool:
    (preliminary version based on the Atlassian Confluence collaboration plattform):
    shall support the management of all incoming project proposals aiming at the use of predefined data sets from the DIC; shall support workflow mechanisms for the UAC to approve or reject project proposals; needs a close integration with the Hospital-/Faculty-wide Trial-/Project Registry;
  • Data Integration and Data Exploration Repositories:
    (i2b2 and OMOP DB already part of MIRACOLIX 0.9)
    Depending on the various scenarios for data integration, cohort identification, data exploration and the types of data which shall be supported different data integration and data exploration repositories will be established;

    • i2b2, IDRT: Informatics for Integrating Biology & the Bedside,
      (compare Murphy et al. 2010; Kohane et al. 2012; Ganslandt et al. 2011; Bauer et al. 2016);
    • tranSMART: tranSMART Foundation,
      (compare Scheufele et al. 2014; Canuel et al. 2015; Herzinger et al. 2017; He et al. 2016; Dunn et al. 2016; Christoph et al. 2017; Knell et al. 2017);
    • OMOP DB (OHDSI):
      (compare Hripcsak et al. 2015; Hripcsak et al. 2016);
    • XNAT:
      (compare Marcus et al. 2007)
    • cBioportal:
      (compare Gao et al. Signal. 2013 &  Cerami et al. Cancer Discov. 2012)
  • ETL Tools:
    (preliminary versions based e.g. on the IDRT project)
    shall support the definition and application of automated pipelines for the extraction, transformation and loading of data in data integration repositories; Open Source Tool Talend OpenStudio (compare Bauer et al. 2016); but also IHE and FHIR based ETL-processes depending on the local MIRACUM site environment
  • IT infrastructure to share and easily deploy soft-ware pipelines for the analysis of omics data:
    shall allow for sharing bioinformatics pipelines and enables the MIRACUM sites to learn from one another; various already established pipelines can be run and evaluated at all sites to compare the different results and to build the ground for the definition of quality parameters and requirements;

    • Deployable analysis infrastructure: Galaxy, Galaxy Tool Shed
      (compare Cock et al. 2013, Lazarus et al. 2012)
    • Docker
      (compare Leprevost et al. 2017, O´Connor et al. 2017); Sharing of bioinformatics pipelines:
  • Tools for Data Quality Analysis, Reporting and Visualisation:
    (ACHILLES Heel already part of MIRACOLIX 0.9)
    shall support the continuous monitoring of data quality in the DIC data repositories with respect to the requirements defined through the MIRACUM use case and the NSG interoperability requirements;

      (compare Huser et al. 2016, Hripcsak et al. 2015);
    • DCC tools:
      (compare Davidson 2015);
  • Connector Component(s):
    (preliminary version of li2b2 already part of MIRACOLIX 0.9)
    shall provide the functionalities required to integrate a local DIZ data integration and data exploration repository into a federated network; shall be based on li2b2, Samply Broker/Connector (German Lung Research Network (DZL), German Cancer Research Network (DKTK), ADOPT BBMRI ERIC, GBN; GBA); tools have been prototypically tested in the DZL, DKTK and GBN projects and will be further extended in GBA and ADOPT
    (compare Mate et al. 2017a, 2017b)
  • Quality Management/ SOP System:
    (preliminary version already part of MIRACOLIX 0.9 based on the Atlassian Confluence Collaboration Platform)
    in order to only apply validated and quality assured processes and IT components MIRACUM will implement a set of Standard Operating Procedures and a continuous quality management process;

The fully released MIRACOLIX 4.0 based DIC architecture (at the end of year 4) shall comprise at least the following further technical components:

Data Anonymization Tool:
Supports the anonymization of datasets by application of statistical disclosure methods adapted to the specific requirements of each use case; e.g. ARX: http://arx.deidentifier.org/; see Prasser et al. 2016abc, Kohlmayer et al. 2015, Prasser et al. 2014

Data Harmonisation/Data Mapping Tool:
Its functionality is closely linked to the MDR; it is supposed to support the map-ping processes between different local data elements and their respective values (value lists) and the centrally defined core data set; its development shall be based on MOLGENIS (BiobankConnect) and the Erlangen Ontology-based Mapping Tools: (compare Pang et al. 2015a; Pang et al. 2015b; Pang et al. 2016; Swertz et al. 2010; Mate et al. 2011; Mate et al. 2015)

Consent Management System:
shall support the electronic management and provision of graduated levels of patient consent in patient care as well as different research contexts; shall be based on gICS (generic informed consent service),  which was developed and applied within the MOSAIC/MAGIC projects (compare Bialke et al. 2015)

Natural Language Processing Tool:
shall support the analysis of freetext information (typically provided in physician discharge letters, clinical notes, radiology/pathology reports and other documents of the EHR) and the annotation of such narrative text documents with structured data elements, defined within the M-MDR; shall be based on the Averbis Information Discovery tool (compare e.g. Seuss et al. 2017, López-Garcia et al. 2016, Christoph et al. 2015, Kreuzthaler et al. 2015, Schulz et al. 2013, 2011)

Hospital-/Faculty-wide Trial-/Project Registry:
shall provide the single source of information about all research projects and clinical trials established at a medical faculty/university hospital; shall comprise all the relevant metadata of such research projects/trials, including e.g. the trial´s eligibility criteria, which shall be further applied in the IT supported patient recruitment tool; to be newly developed within the MIRACUM project based on prototypes existing at GUF, UKFr and UME (compare e.g. Trinczek et al. 2014, Schreiweis et al. 2014)

Modules for innovative user-friendly and efficient patient care process visualization:
a set of generic and easy configurable patient care process visualization modules, such as e.g. sunburst diagrams, a tryptychon panel, patient timelines and geo visualizations, e.g. R statistical package with ggplot2 and Shiny add-ons; compare Icahn School of Medicine at Mount Sinai http://ehdviz.dudleylab.org/ (Badgeley et al 2016)

Long-Term Research Data Archive:
shall support the long-term storage of data (but also respective analysis software artefacts/queries) which have been applied in a research project; requires a close integration with the hospital-/faculty-wide trial-/project registry and the project proposal management tool;

  • iRODS (Wellcome Trust Sanger Institute, University College London, Utrecht University)
    (compare Rajasekar A et al 2010, Chiang GT et al 2011)
  • CKAN (UK Government open data portal (data.gov.uk), University of Lincoln)
    (compare Winn J 2013)

Federated Authentication:
shall allow users to authenticate against central as well as local components. Accounts issued in home institutions can be used via DFN-AAI; e.g. OAUTH (compare Choi et al 2016b, Rieger 2009); Samply.Auth (MAGIC project)

Tools for Development, Deployment and Monitoring of IT Solutions:
(Continuous Integration Test-Pipeline)
shall support the development and test of dedicated MIRACOLIX software solutions at one MIRACUM site and its integration test into the general DIC architecture as well as the deployment to all other MIRACUM sites and continuous monitoring in operation;