# Advanced Computational Methods for Bayesian Calibration of High-Dimensional Dynamical Models

## Project Description

### Existing background work

Professor Alexandros Beskos has made fundamental research contributions regarding:

- the development of computationally intensive Sequential Monte-Carlo (SMC) algorithms that can overcome the curse-of-dimensionality that characterises standard approaches [1];
- the extension of the applicability of SMC algorithms on high-dimensional models, across a spectrum of application fields, including: physics-driven Data Assimilation problems in atmospheric sciences [6]; whole-genome applications in biomedicine [5]; multivariate Graphical models in finance [3]. The above research has been carried out in collaboration with expert colleagues, in the designated fields (Professor Dan Crisan, Dr Nikolas Kantas at Imperial College, UK; Professor Ajay Jasra at the Chinese University of Hong Kong in Shenzhen, China; Professor Maria de Iorio at the National University of Singapore; Prof Stephan Beck at the Cancer Institute, UCL).

## Direction of atmospheric sciences:

SMC methods – in particular Particle Filters (PFs) – have only very recently started being attempted and used in high-dimensional Data Assimilation applications. In contrast to ensemble Kalman Filters (EnKFs) which are the particle-based algorithms typically used in Data Assimilation and are based on Gaussian approximations, PFs make use of the correct model and can recover strong non-linearities characterising dynamical systems, especially at high resolutions [7]. However, standard PF algorithms are known to suffer from the curse-of-dimensionality.

Recent research has focused on the development of a new generation of PFs that can be effective for Data Assimilation applications in high dimensions. Such PFs are termed “Localised Particle Filters” (LPFs). LPFs can overcome the curse-of-dimensionality by replacing a single high-dimensional global importance sampling step – across the whole domain of the signal and for all set of arriving observations – with a number of ‘local’ importance sampling steps (at a lower dimension) that only make use of the subset of data that are informative for chosen subdomains. The localised approach is based on the simple remark that observations obtained at a given position of the domain of the signal contain minimal information for parts of the domain which are far from such a position.

Such methodological advances have led to LPFs very recently been tested in operational settings in Numerical Weather Forecasting (NWF) [8], where states can be of dimension of O(10^9) or higher.

## Direction of biomedicine:

There is a need for the development of statistical models for genome-wide epigenetics data and of the accompanying computational methods for their calibration. Such computations can involve the use of state-of-the-art scalable SMC algorithms and High-Performance Computing (HPC), as datasizes can be of order of 10^8 or more. Appropriate models recently proposed in the literature include, e.g., change-point dynamics [5] that track DNA methylation patterns jointly over cases/controls, across the whole genome, and can identify particular positions along the genome where cases and controls have different patterns.

Full Bayesian inference for genome-wide models can provide much more information for underlying patterns than competing methods. So far Biologists have relied on out-of-shelf approaches when the cost for producing the data in the lab is enormous, so maximised extraction of information from available data is of high significance. In general, there appears to be lack of expertise in the development of suitable statistical models in this field and in corresponding scalable computational methods for fitting these models.

## Direction of finance:

High-dimensional applications, involving dynamical models, are abundant in finance, see e.g. [9]. Recent advances in generic computational SMC methodology have yet to be fully appreciated by researchers focused on applications in this area.

## Main objectives of the project

The main objectives of the project will depend on the particular application domain of interest, and can be summarised as follows:

Investigate state-of-the-art SMC algorithms. Such a study is important given several new methods proposed in recent research in the field of Data Assimilation (see e.g. [4]) or in a general context in (see e.g. [2]).

Explore key strengths/weaknesses of SMC algorithms and contrast approaches against alternative biased methods (e.g. EnKFs in atmospheric sciences).

Study performance of advanced SMC methods across a number of benchmark model scenarios.

Develop novel SMC algorithms which make use of expert methodology available in the SMC community, but which has yet to be fully used by researchers focused on particular application domains. Algorithmic tools already shown to greatly improve performance of SMC involve e.g.: tempering accompanied by jittering of particles to better disperse the latter across space; adaptation techniques; data-driven improved proposals for particles. Such procedures aim to generate particles within a Monte-Carlo framework which represent effectively the hidden signal. This latter signal can be modelled, e.g., via PDE/SPDE dynamics in atmospheric sciences, change point models in genome-wide applications in biomedicine, and time-evolving Graphical models in finance.

## Details of Software/Data deliverables

There is an apparent lack of software for implementation of advanced SMC methods in all three areas of application highlighted above (and beyond). Provision of such software will be a key output delivered by the PhD project, given the chosen application field. Indicatively, in the field of atmospheric sciences, the comprehensive review work in [4] provides a package in GitHub (https://github.com/thiery-lab/data-assimilation) but the purpose of this software seems to be the reproducibility of the results in the accompanying paper rather than a general-purpose software. In the case of genome-wide applications in biomedicine, Professor Alexandros Beskos has been involved in the development of the package in https://github.com/ucl-medical-genomics/hygeia, but this is still at its infancy. Thus, the proposed project will provide an opportunity for a trainee in Computational Modelling to develop a software that will cover a gap in an important application field – selected based on the applicant’s interests. As such, the produced software can have a large impact. By their very nature, SMC algorithms require state-of-the-art parallelisation and High-Performance Computing (HPC) implementations. Computations will be carried out within CPU or GPU architectures, depending on the problem at hand, or on the Cloud.

## References

[1] Beskos, A., Crisan D., Jasra, A. (2014). On the stability of sequential Monte Carlo methods in high dimensions. Annals of Applied Probability, 24, 1396-1445.

[2] Finke, A., Thiery, A. (2023). Conditional sequential Monte Carlo in high dimensions. Annals of Statistics 51, 437-463.

[3] Franzolini, B., Beskos, A., De Iorio, M., Poklewski Koziell, W., Grzeszkiewicz, K. (2024). Change point detection in dynamic Gaussian graphical models: the impact of COVID-19 pandemic on the US stock market. Annals of Applied Statistics 18, 555-584.

[4] Graham, M., Thiery, A. (2019). A scalable optimal-transport based local particle filter. arXiv preprint arXiv:1906.00507.