1. What is the role of your team in the Fairvasc project?
I work across the three main teams in FAIRVASC, which are the FAIRVASC Implementation Team, Harmonisation Team and Query Implementation Team, or FIT, HIT and QIT respectively. My main role is to manage the FIT and oversee the technical implementation of the work conducted by the HIT (and informed by QIT). This consists of converting the HIT outcomes into an ontology, and to then work with the registries IT specialists to implement mappings which are used to do uplift (i.e. the execution of mappings) of the tabular registry data into RDF. I also work with the QIT team, and translate the data requirements developed by QIT (i.e. the project clinicians) into SPARQL query templates. These are then implemented in the FAIRVASC Dashboard, a web interface, so the results of the queries can be visualised and interrogated by the clinicians and the FAIRVASC team. This process creates a feedback loop, where new queries drive the development of new requirements for harmonisation of data. The FIT team is also responsible for implementing other aspects of the interface, such as authentication and logging of queries.
2. What tasks have been completed by your team so far and which of them have been the most challenging ones?
Our team has developed the FAIRVASC ontology, and R2RML mappings based on this ontology for uplifting the registries data into RDF, and also the FAIRVASC Dashboard, an interface for running queries and returning results. The biggest challenge remains ensuring that the data identified by the harmonisation team is not only uplifted into RDF correctly, but that the developed queries correctly return the data. The R2RML mappings must do some pre-processing of data, so it is important to ensure that data is transformed correctly. Similarly, the SPARQL queries can become complex, and the resulting data must be checked against the original data. This requires a process of validation, where by the results of uplift are examined, and the results of queries are examined.
3. Each team is made up of people from different European centres. How is the work of such a group organised?
The FIT team meets every second week over zoom, following a HIT call the week before. The work completed by the HIT is first implemented as R2RML for the Irish RKD registry, and then these are presented to the FIT team, who must implement their own mappings. The SPARQL queries are also presented to the FIT team members, who must then test these on their own uplifted data locally on their own triplestore. This way any issues or errors in the ontology, mappings or queries can be identified and addressed. Once the team are happy with the implementation of the mappings and queries, it is then implemented into the interface. The QIT can then interrogate the outcomes of the queries, and provide feedback on the Dashboard. New queries are then discussed, developed and the iterative development process continues.
4. How does the FIT team work help realise the requirements of clinicians and answer FAIRVASC clinical questions?
By implementing the outcomes of the HIT, the FIT make it possible to run queries over the combined data on rare disease anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV), thus providing access to harmonised data relating to unprecedented numbers of patients with rare disease. By using aggregated and pseudonymised data, analysis of AAV data is achieved in a manner which protects patient privacy. For additional security, the federated querying approach is augmented with a method for auditing queries (and the uplift process) using the provenance ontology (PROV-O) to track when queries and changes occur and by whom. Clinicians can therefore use FAIRVASC to run new analysis over this data, not possible before, adding to the understanding of AAV, and potentially leading to new treatments.
5. How can the work of this team be applied to the new registries in a future federation process?
The methodology for the development of the ontology, the mappings, and the interface is proven and can be repeated for any registry who wishes to integrate their data into the FAIRVASC infrastructure. The FAIRVASC ontology by making use of semantic web technologies, is findable, accessible, interoperable and reusable, and so too, the data generated according to it. The mappings and query templates will also be made available to registries who wish to join the FAIRVASC approach. In addition, FAIRVASC is working closely with the European Joint Programme on Rare Diseases to align the FAIRVASC ontology with their Common Data Model, which potentially can support greater interoperability with wider rare disease data models.