Drift detection in graph streams and its applications in healthcare
Project Description
Existing background work
Graphs have become a useful tool for representing information in many application domains. Social, computer, sensor and transport networks; molecular structures and business processes – all can be represented as attributed graphs. One of the characteristics of such graphs is dynamism – the graph structure, as well as the attributes of nodes and edges can change over time. The accuracy of predictive and inference models built over dynamic graphs depends on the ability of the models to adapt to these changes. This project will propose novel methods for detecting changes in graphs over time (also known as drifts) and demonstrate their usefulness in downstream machine learning and process mining tasks performed over dynamic graphs. The work will build upon the methods recently proposed by the principal supervisor based on graphs and deep learning for process mining https://doi.org/10.1109/ACCESS.2020.3025999 and drift detection in business processes https://doi.org/10.3390/e24070910. The PhD student will take this previous work as a basis to both advance the theoretical approach to drift detection in graph streams and demonstrate its generalizability by applying to a new domain, namely, discovering disease trajectories. Disease trajectories represented as graphs can help predict disease progression, risk of developing comorbid disorders and patient outcomes more accurately Kusuma et al. Existing solutions for discovering disease trajectories are based on statistical analysis https://doi.org/10.1038/ncomms5022 and knowledge graphs https://doi.org/10.1186/s13326-020-00228-8, thus computationally expensive and not scalable. Furthermore, these solutions are not capable of adapting to changes over time (e.g. changes in disease progression due to events such as the coronavirus pandemic or introducing a new drug/vaccination). Finally, there is currently no solution exists based on the UK population data. The methods built in this project will be applied to Hospital Episode Statistics (HES) data, thus revealing the healthcare picture of the entire UK population.
Main objectives of the project
- Develop novel methods for drift detection in graph streams to outperform the state-of-the-art methods according to at least one metric: accuracy, scalability, computational time.
- Demonstrate the generalizability of the developed methods by applying them to several domains such as detecting drifts in business processes and disease trajectories, aiming to outperform the state-of-the-art methods in respective areas according to at least one metric: accuracy, scalability, computational time.
- Develop software allowing non-domain experts query process graphs to answer a range of research/business questions.
Details of Software/Data Deliverables
This project will deliver a disease trajectory browser, similar to the one proposed in https://doi.org/10.1038/s41467-020-18682-4 but based on more advanced process discovery and drift detection in graph streams methods developed as part of this project and leveraging UK population data. The proposed process discovery and drift detection methods will be developed in Python. The discovered graphs will be sorted in a graph database such as Neo4j. The frontend will be built using a range of JavaScript libraries, including Cytoscape.js and Dagre.js to represent graphs. The browser will be released as an open-source software under the MIT license.