The platform and the catalogue - how they work in practice
One of the project’s principal outcomes is a web-accessible IT platform that includes a browsable ‘data catalogue’ that allows researchers to look for suitable existing datasets that can help with their research questions. The catalogue lets them know what kind of data is out there, and comes with the metadata of two main types of data sources: electronic health records, and data from cohort studies, meaning data about groups of people involved in studies of particular diseases.
To use the EMIF platform, researchers are invited to create an account where they can initiate new studies based on one of these two sources. Researchers wishing to carry out studies using real-world digital patient records start by searching the catalogue for the most suitable data source. A form allows them to ‘ask’ the data source questions, like whether or not the data includes, for example, the hospital where patients were admitted, or whether biobank information comes with DNA or other material. Using the tool, they can then make a proposal that outlines the aim of their study and their preferred data sources.
This input is then taken on board by EMIF, who help to support the negotiation with the data owner. EMIF provides templates that can be used to create a full study protocol, as well as a document repository and a project management tool for everyone involved in the study. If the data sources chosen by the researcher are formatted and structured differently, the EMIF platform can harmonise them either by mapping the data sources to each other (made available by ADVANCE) or by mapping them to OMOP Common Data Model. The researcher runs a script against the data, and the platform return an anonymised dataset, which is then uploaded to a private (remote) environment. Here, the researcher can run analyses on the dataset using different analysis tools.
If the researcher is looking for cohort data, they search for the disease ‘community’ on the EMIF platform (if a given community can’t be found, they can create it). They then browse for relevant cohorts based on their needs, create a study protocol and request access etc. They can then carry out private, secure and remote analyses with statistical analysis tools like tranSMART, or even log in to the remote environment and perform analysis using your own tools. Again, the data from the cohorts is harmonised by EMIF.
Two test cases: identifying new biomarkers linked with resilience to Alzheimer’s and obesity
To make sure the platform was suitable for use in real-life scenarios, the EMIF team used research queries about two common diseases – Alzheimer’s Disease and obesity - as test cases. For Alzheimer’s, they created an overview of the existing patient cohorts and integrated it with the EMIF data catalogue and secure environment. Reusing 14 existing cohort datasets for whom extensive information on Alzheimer’s biomarkers were available, combined with newly-collected data on 300 cognitively ‘normal’ people aged 60-80, the aim was to understand how symptoms and biomarkers observed in younger and older groups of people with dementia relate to extremely elderly people (over 90 years of age) who don’t. The idea was to identify new biomarkers linked with not developing Alzheimer’s in the older group, and they looked at beta amyloid load, hippocampal atrophy and cognitive markers like rate of cognitive decline.
The EMIF platform enabled the team to quickly locate 1,200 blood samples that ultimately contributed to the development of a new blood test to determine whether someone has Alzheimer’s before they develop symptoms – something that would make clinical trials of anti-dementia drugs more effective. In this regard, EMIF saved years of work. They showed, definitely and for the first time, that prediabetes and Alzheimer’s are linked through the basic mechanisms of disease, and not because people with diabetes get strokes or other changes to their brain that mimic Alzheimer’s.
The other critical research area that was used by the EMIF project as a test case was obesity. Many obese individuals don’t end up developing the common complications of the disease, such as type two diabetes, cardiovascular disease or some cancers, while, conversely, many non-obese individuals do. The EMIF researchers sought to identify biomarkers that might point to the risk of complications like these in order to find out which mechanisms were related to this difference in outcomes. They identified molecules related to insulin secretion capacity, insulin resistance and non-alcoholic fatty liver disease (NAFLD), as well as a potential new therapeutic for NAFLD. These biomarkers were tested in small and medium-sized cohorts and then in larger populations. Validating such biomarkers could make for better and more focused clinical trials and help make decisions about the risk-benefit of new drugs by targeting treatments to those at high risk.
What’s next?
The EMIF platform is available to any researcher who wants to sign up. Knowledge gained in EMIF is currently being used in the European Health Data and Evidence Network (EHDEN) project which is putting together a federated network of Data Partners and will run multiple studies on this data.