Medical Data Science

The innovation area focuses on AI and data science methodologies for digital health in drug discovery, preclinical and clinical research. At Fraunhofer ITMP, medical data science is centered around the four major domains of the Fraunhofer 4D concept in health research: Drugs, Devices, Data, and Diagnostics (also refer to 4D Clinic). The innovation area Medical Data Science deals with handling and analysis of various kinds of medical data, such as data from clinics and clinical trials, OMICS technologies, electronic health records, medical imaging and wearables. Our core competencies include machine learning, knowledge graphs and federated learning, and FAIR (Findable, Accessible, Interoperable, Reusable) handling of medical data.

A special focus lies on the investigation of immune-mediated diseases in cooperation with clinicians, pharmaceutical companies and academic partners. Cutting-edge machine learning algorithms are leveraged for the diagnosis, prognosis and precision medicine therapy for immune-mediated diseases. Fraunhofer ITMP possesses strong expertise in the design of software and hardware solutions (including its high-throughput laboratories) for open research platforms for both research and industry. These platforms facilitate the exploration and practical testing of digital health research concepts and commercial offerings.

 

Core competencies:

  • Machine learning for 4D (Drugs, Devices, Data, and Diagnostics)
  • Knowledge graphs and graphical neural networks for medical research
  • FAIR (Findable, Accessible, Interoperable, Reusable) medical data management and knowledge graphs
  • Generative AI and synthetic medical data
  • Biostatistical support of clinical and preclinical studies
  • Federated learning infrastructure and medical data science platform

Federated infrastructure for health

The healthcare industry is increasingly opening up to data exchange and is testing a variety of digital solutions including federated learning infrastructures. Federated learning for healthcare is a machine learning paradigm that addresses the challenge of medical data governance and privacy by training algorithms collaboratively without exchanging the data itself between different stakeholders, e.g., clinics, pharma companies, academic and other public institutions.

Federated learning allows gaining insights through a centralized aggregate server, such as a consensus model, without moving medical or patient data beyond the firewalls of the institutions where they are located. The machine learning is trained locally at each participating institution and only model characteristics (e.g., parameters, gradients) are transferred. In short, in federated learning the AI model travels between the participating clients, not the data. The systematic planning of such an infrastructure for our customers and consortia is supported in a practical way by the offer of prototypical construction and operation of research platforms.

Fraunhofer ITMP is driving initiatives to enable the exchange of research data, according to German standards (e.g., DSGVO, ethics applications) and on the European level (European Health Data Space, GAIA-X, European Open Science Cloud and International Data Spaces Association). We provide solutions for a federated scalable and interoperable data infrastructure to establish a new paradigm of heterogeneous health research, enabling collaboration among healthcare providers, researchers, and industry partners (see as well Medical Data Space).

Study and cohort analyses

Fraunhofer ITMP supports clinical and preclinical studies and cohort analyses with our expertise in:

  • Analysis and regulation of Phase I to IV studies, POC studies
  • AI and machine learning
  • Identification of suitable target populations and optimized endpoints
  • Mathematical modelling and statistical methods
  • Project-specific input and output formats and dashboards

A broad AI toolbox of commercial and proprietary software tools is used. The competence of Fraunhofer ITMP is based on the integration of data scientists into the clinical routine with a focus on the indication areas of immune-mediated and inflammatory diseases at its Frankfurt location. In collaboration with Fraunhofer SCAI, Fraunhofer ISST and other Fraunhofer institutes, large amounts of data are used for analysis using cutting-edge artificial intelligence and machine learning, as well as the early use of these techniques to improve clinical care and knowledge gain.

Knowledge graphs for drug discovery and drug repurposing

Knowledge Graphs (KGs) are advanced forms of networks that capture the semantics of the constituent entities and the interactions among them. In context of biomedicine and life sciences, KGs represent disease-associated biological and pathophysiological phenomena by systematically assembling various inter-related entities such as proteins and their biological processes, molecular functions and pathways, chemicals and their mechanism of actions and adverse effects. They have been deployed in several use cases and downstream analyses related to healthcare, pharmaceutical and clinical settings. However, the process of creating KGs is expensive and time-consuming because it requires a lot of manual curation. Moreover, machine-aided methods such as text-mining workflows and Large Language Models (LLMs) have their own shortcomings and are improving gradually.

We have developed a fully automated workflow called Knowledge Graph Generator (KGG), for creating KGs that represent chemotype and phenotype of diseases. The KGG embeds underlying schema of curated public databases to retrieve relevant knowledge which is regarded as the gold standard for high quality data. Graph neural networks can be used for prediction in links and nodes in the KG for pre-clinical drug discovery, understanding disease mechanisms/comorbidity and drug repurposing.

The KGG is leveraged on our previous contributions to the BY-COVID project where we developed workflows for the identification of bio-active analogs for fragments identified in COVID-NMR studies (Berg, H et al., 2022) and the representation of Mpox biology (Karki, R et al., 2023).

FAIR handling and analysis of medical data

Extracting information and insights from unstructured and "unclean" data requires a FAIR (Findable, Accessible, Interoperable, Reusable) data management system that makes use of adapted system alignments and agreements, standardizations and ontology catalogs, and tools and exploratory data analysis (EDA) workflows. In our projects we transfer data validation and method validation into validation studies. The moderation of the "questions to the data" has a high value and can be observed especially in AI applications during the project formulation. The definition, distribution and generation of training or validation data sets is a prerequisite for AI developments and can be supported by our real-world evidence or synthetic cohorts.

IDERHA: Integration of heterogeneous data and evidence towards regulatory and HTA acceptance

IDERHA is a European public-private partnership launched in April 2023. This pioneering project addresses the obstacles in accessing, integrating and analyzing health data to maximize their value for patient care and medical research.

An open, disease agnostic, federated data space will be developed. The federated data space will enable connectivity, access, use and reuse of digital health data. In IDERHA, consensus policy recommendations on health data access and heterogeneous health research such as real-world evidence (RWE) are developed for regulatory and HTA decision making.

Partners: IDERHA is led by Fraunhofer ITMP and Johnson & Johnson Medical GmbH, in a consortium of 33 academic, clinical, medtech, pharmaceutical, and IT partners, as well as patient advocacy organizations and public authorities, including Fraunhofer institutes SCAI and ISST.

Additional Information

SYNTHIA: Synthetic data generation framework for integrated validation of use cases and AI healthcare applications

SYNTHIA is an ambitious collaboration between public and private institutions to facilitate the responsible use of Synthetic Data (SD) in healthcare applications. The project will improve the methodological and technical aspects of SD Generation (SDG) by developing new techniques and advancing established ones for different data modalities, including genomics and imaging, to improve the generation of realistic multimodal and longitudinal data.

The open SYNTHIA federated platform will facilitate responsible SD use by the health research community, in particular long-term access to extensively validated, reusable synthetic datasets, as well as to SDG workflows and SD assessment frameworks. A multidisciplinary collaboration of SDG developers, FAIR data experts, clinical researchers, developers of therapies and data-based tools, legal experts, socio-economic analysts, regulatory, policy advocacy, and communication experts will provide a 360º vision on how to advance healthcare applications through SD use.

Partners: Consortium of 43 academic, clinical, pharmaceutical, IT and public partners, including Fraunhofer institutes ITMP, SCAI and MEVIS.

Additional Information

FAIRplus

The vast amounts of data generated in life science research have the potential to add to our understanding of disease and help advance drug development. Yet most data is hidden away in proprietary databases and stored in different formats. The goal of FAIRplus is to deliver guidelines and tools to facilitate the application of FAIR principles to data from certain IMI projects and datasets from pharmaceutical companies. FAIR stands for Findable, Accessible, Interoperable, Reusable. The project will therefore make it easier for other researchers to find the data and integrate it into their own research. The project will also organise training courses for data scientists in academia, small and medium-sized enterprises (SMEs) and pharmaceutical companies. Ultimately, the project hopes to change the culture of data management in the life sciences sector.

Additional Information

Rischke S, Schäfer SMG, König A, Ickelsheimer T, Köhm M, Hahnefeld L, Zaliani A, Scholich K, Pinter A, Geisslinger G, Behrens F, Gurke R.
Metabolomic and lipidomic fingerprints in inflammatory skin diseases - Systemic illumination of atopic dermatitis, hidradenitis suppurativa and plaque psoriasis.
Clin Immunol. 2024 Aug;265:110305
doi: 10.1016/j.clim.2024.110305

Karki R, Gadiya Y, Gribbon P, Zaliani A.
Pharmacophore-Based Machine Learning Model To Predict Ligand Selectivity for E3 Ligase Binders.
ACS Omega. 2023 Aug 9;8(33):30177-30185
doi: 10.1021/acsomega.3c02803

Rocca-Serra P, Gu W, Ioannidis V, Abbassi-Daloii T, Capella-Gutierrez S, Chandramouliswaran I, Splendiani A, Burdett T, Giessmann RT, Henderson D, Batista D, Emam I, Gadiya Y, Giovanni L, Willighagen E, Evelo C, Gray AJG, Gribbon P, Juty N, Welter D, Quast K, Peeters P, Plasterer T, Wood C, van der Horst E, Reilly D, van Vlijmen H, Scollen S, Lister A, Thurston M, Granell R; FAIR Cookbook Contributors; Sansone SA
The FAIR Cookbook - the essential resource for and by FAIR doers.
Sci Data. 2023 May 19;10(1):292
doi: 10.1038/s41597-023-02166-3

Karki R, Gadiya Y, Zaliani A, Gribbon P.
Mpox Knowledge Graph: a comprehensive representation embedding chemical entities and associated biology of Mpox.
Bioinform Adv. 2023 Apr 3;3(1):vbad045
doi: 10.1093/bioadv/vbad045

Berg H, Wirtz Martin MA, Altincekic N, Alshamleh I, Kaur Bains J, Blechar J, Ceylan B, de Jesus V, Dhamotharan K, Fuks C, Gande SL, Hargittay B, Hohmann KF, Hutchison MT, Marianne Korn S, Krishnathas R, Kutz F, Linhard V, Matzel T, Meiser N, Niesteruk A, Pyper DJ, Schulte L, Trucks S, Azzaoui K, Blommers MJJ, Gadiya Y, Karki R, Zaliani A, Gribbon P, da Silva Almeida M, Dinis Anobom C, Bula AL, Bütikofer M, Putinhon Caruso Í, Caterina Felli I, Da Poian AT, Cardoso de Amorim G, Fourkiotis NK, Gallo A, Ghosh D, Gomes-Neto F, Gorbatyuk O, Hao B, Kurauskas V, Lecoq L, Li Y, Cunha Mebus-Antunes N, Mompeán M, Cristtina Neves-Martins T, Ninot-Pedrosa M, Pinheiro AS, Pontoriero L, Pustovalova Y, Riek R, Robertson AJ, Jose Abi Saad M, Treviño MÁ, Tsika AC, Almeida FCL, Bax A, Henzler-Wildman K, Hoch JC, Jaudzems K, Laurents DV, Orts J, Pierattelli R, Spyroulias GA, Duchardt-Ferner E, Ferner J, Fürtig B, Hengesbach M, Löhr F, Qureshi N, Richter C, Saxena K, Schlundt A, Sreeramulu S, Wacker A, Weigand JE, Wirmer-Bartoschek J, Wöhnert J, Schwalbe H.
Comprehensive Fragment Screening of the SARS-CoV-2 Proteome Explores Novel Chemical Space for Drug Development. Angew Chem Int Ed Engl. 2022 Nov 14;61(46):e202205858
doi: 10.1002/anie.202205858

Khorchani T, Gadiya Y, Witt G, Lanzillotta D, Claussen C, Zaliani A.
SASC: A simple approach to synthetic cohorts for generating longitudinal observational patient cohorts from COVID-19 clinical data.
Patterns (N Y). 2022 Apr 8;3(4):100453
doi: 10.1016/j.patter.2022.100453