Robust Bayesian Inference at Scale
Project Description
### Existing background work
Robustness refers to the ability of a model to perform well on unseen data, or data that is different from the data it was trained on. It is an ever-evolving challenge for practitioners of statistical and machine learning methods, who need to deal with large, complex, and un-curated data sets and build tools that are reliable in uncontrolled environments. It is also particularly important in critical applications, such as medical diagnosis or weather prediction, where a lack of robustness can have a severe impact. In this project, we will focus on the robustness of Bayesian machine learning. In Bayesian inference, we start with a prior belief about a quantity of interest, and then update this belief based on our model of the world and new evidence in the form of data. This allows us to formally describe our uncertainty and make reliable predictions. However, a crucial assumption is that the model can truly represent the data-generating process. When this assumption is violated, the model is called misspecified, and our predictions become unreliable. To remedy this issue, a wide range of approaches have been proposed to make Bayesian methods more robust, including modifications of standard likelihoods and generalised Bayesian approaches.
Main objectives of the project
The main aim of this project will be to develop novel robust alternative to existing probabilistic machine learning algorithms which are used across the physical sciences, ranging from statistical emulators such as Gaussian processes to algorithms for large-scale filtering such as the Kalman filter. The focus will be on carefully selecting the generalised Bayesian update rules to ensure these algorithms have the same, or even lower, computational complexity than their existing counter-parts based on Bayesian updates, and that they can be efficiently implemented on modern scientific computing hardware and infrastructure.
Details of Software/Data Deliverables
The main development of software will be a package for robust Bayesian methods in Python. The aim will be to make the algorithms developed as part of this project easily available to the scientific community, and to make it straightforward for algorithms to be rigorously and fairly compared against existing competitors.