## Shape Constraints Meet Kernel Machines

**Speaker**: Zoltan Szabo (London School of Economics)

Shape constraints (such as non-negativity, monotonicity, convexity, or supermodularity) provide a principled way to encode prior information in predictive models with numerous successful applications in econometrics, finance, biology, reinforcement learning, and game theory. Incorporating this side information in a hard way (for instance at all point of an interval) however is an extremely challenging problem. In this talk I am going to present a unified and modular convex optimization framework to encode hard affine constraints on function values and derivatives into the flexible class of reproducing kernel Hilbert spaces. The efficiency of the technique is illustrated in the context of joint quantile regression (analysis of aircraft departures), convoy localization and safety-critical control (piloting an underwater vehicle while avoiding obstacles). [This is joint work with Pierre-Cyril Aubin-Frankowski.]

**Time and Place**: Thursday 04th November at 3 pm (online)

### Previous talks

#### Toward a Perpetual Learning Machine in Continual Control

**Speaker**: Shane Gu (Google Brain)

**Details**

Many supervised learning and generative modeling applications (e.g. computer vision, NLP, molecular biology, etc.) have experienced exponential progresses with exponential growths in data and computation. Recent models such as DALL-E and GPT-3 are essentially perpetual learning machines, capable of learning new concepts and capabilities by simply ingesting more data without much more human engineering. Reinforcement learning (RL) applications such as robotics control, however, are mostly limited to successes in narrow domains, where learned knowledge is often non-transferrable. How can we develop a continual learning system for RL agents that automatically grow in capabilities without human interventions? I will discuss progress and challenges, on topics including (1) algorithmic RL research and sample- and compute-efficiencies, (2) reset-free learning, (3) environment diversity and robotics, (4) environment engineering and optimizability, (5) roles of physics simulators, (6) self-supervised RL algorithms, and (7) universal metrics for robot intelligence.

**Time and Place**: Wednesday 20th October at 10 am (online)

#### Monotonic Alpha-Divergence Variational Inference

**Speaker**: Kamélia Daudel (University of Oxford)

**Details**

Variational Inference methods have made it possible to construct fast algorithms for posterior distribution approximation.
Yet, the theoretical results and empirical performances of Variational Inference methods are often impacted by two factors:
one, an inappropriate choice of the objective function appearing in the optimisation problem and
two, a search space that is too restrictive to match the target at the end of the optimisation procedure.
In this talk, we explore how we can remedy these two issues in order to build improved Variational Inference methods.
More specifically, we suggest selecting the alpha-divergence as a more general class of objective functions and
we propose several ways to enlarge the search space beyond the traditional framework used in Variational Inference.
The specificity of our approach is that we derive numerically advantageous algorithms that provably ensure a systematic
decrease in the alpha-divergence at each step.
In addition, our framework allows us to unravel important connections with gradient-based schemes from the optimisation
literature as well as an integrated EM algorithm from the importance sampling litterature.

**Time and Place**: Thursday 16th September 2021 at 3.00pm on Zoom (contact Motonobu Kanagawa for the link)

#### Modeling Knowledge Incorporation into Topic Models and their Evaluation

**Speaker**: Silvia Terragni (University of Milano-Bicocca)

**Details**

Topic models are statistical methods that aim at extracting the themes, or “topics”, from large collections of documents.
We may have some knowledge, associated with the documents (e.g. document labels, pre-trained representations) that can be exploited to improve the quality of the resulting topics.
In this talk, I will review different methods to incorporate knowledge into topic models.
Moreover, due to their stochastic and unsupervised nature, topic models are difficult to evaluate.
Therefore, I will discuss the issues of their evaluation and show how to guarantee a fairer comparison between the models.

**Time and Place**: Thursday 17th June 2021 at 3.00pm on Zoom (contact Motonobu Kanagawa for the link)

#### Explainable Fact Checking for Statistical and Property Claims

**Speaker**: Paolo Papotti (Professor at EURECOM)

**Details**

Fact checkers are overwhelmed by the amount of false content that is produced online every day. To support fact checkers in their efforts, we are creating data driven verification methods that use structured datasets to assess claims and explain their decisions. For statistical claims, we translate text claims into SQL queries on relational databases. For property claims, we use the rich semantics in knowledge graphs (KGs) to verify claims and produce explanations. Experiments show that both methods enable the efficient and effective labeling of claims with interpretable explanations, both in simulations and in real world user studies with 50% decrease in verification time. Our algorithms are demonstrated in a fact checking website (https://coronacheck.eurecom.fr), which has been used by more than twelve thousands users to verify claims related to the coronavirus disease (COVID-19) spreads and effects.

**Time and Place**: Thursday 27th May 2021 at 3.00pm on Zoom (contact Motonobu Kanagawa for the link)

#### Interpretable Comparison of Generative Models

**Speaker**: Wittawat Jitkrittum (Research Scientist at Google Research)

**Details**

Given two generative models (e.g., two GAN models), and a set of target observations (e.g., real images), how do we know which model is better? In this talk, I will introduce recently developed kernel-based distance measures that will help us answer this question. These measures can be used to construct a nonparametric, computationally efficient statistical test to systematically measure the relative goodness of fit of the two candidate models. As a unique advantage, the test can produce a set of examples showing where one model fits significantly better than the other. No deep background knowledge on kernel methods or statistical testing is needed for this talk. All prerequisites will be introduced.

**Time and Place**: Thursday 5th November 2020 at 3.00pm on Zoom (contact Motonobu Kanagawa for the link)

**Slides**: Slides of the talk are available here.

#### Explaining the Explainer: A First Theoretical Analysis of LIME

**Speaker**: Damien Garreau (Assistant Professor at the University Cote d’Azur)

**Details**

Machine learning is used more and more often for sensitive applications, sometimes replacing humans in critical decision-making processes. As such, interpretability of these algorithms is a pressing need. One popular algorithm to provide interpretability is LIME (Local Interpretable Model-Agnostic Explanation). In this talk, I will present a first theoretical analysis of LIME. In particular, we derived closed-form expressions for the coefficients of the interpretable model when the function to explain is linear. The good news is that these coefficients are proportional to the gradient of the function to explain: LIME indeed discovers meaningful features. However, our analysis also reveals that poor choices of parameters can lead LIME to miss important features.

#### Variable Prioritization in Nonlinear Black Box Methods, with application in Genomics and to Interpreting Deep Neural Network

**Speaker**: Seth Flaxmann (Lecturer in the Statistics at Department of Mathematics of Imperial College London)

**Details**

I will present two recent papers (https://arxiv.org/abs/1801.07318 and https://arxiv.org/abs/1901.09839) describing our work on developing new methods to interpret nonlinear Bayesian machine learning models. In the first paper, we address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the “RelATive cEntrality” (RATE) measure to prioritize candidate genetic variables that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Gaussian process regression, but the methodological innovations apply to other “black box” methods. In the second paper, we extend these methods to deep neural networks (DNNs) and computer vision. DNNs are successful across a variety of domains, yet our ability to explain and interpret these methods is limited. We propose an effect size analogue for DNNs that is appropriate for applications with highly collinear predictors (ubiquitous in computer vision).

#### Learning on Aggregate Outputs with Kernels

**Speaker**:
Dino Sejdinovic (Associate Professor at the Department of Statistics, University of Oxford)

**Details**

Learning on Aggregate Outputs with Kernels. While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs.
Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidences. Joint work with Leon Law, Ewan Cameron, Tim CD Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. Biography: Dino Sejdinovic is an Associate Professor at the Department of Statistics, University of Oxford, a Fellow of Mansfield College, Oxford, and a Turing Fellow of the Alan Turing Institute. He previously held postdoctoral positions at the Gatsby Computational Neuroscience Unit, University College London (2011-2014) and at the Institute for Statistical Science, University of Bristol (2009-2011) and worked as a data science consultant in the financial services industry. He received a PhD in Electrical and Electronic Engineering from the University of Bristol (2009) and a Diplom in Mathematics and Theoretical Computer Science from the University of Sarajevo (2006).

#### Sparse Approximate Inference for Spatio-Temporal Point Process Models with Application to Armed Conflict

**Speaker**:
Andrew Zammit Mangion
(Senior Research Fellow at the University of Wollongong - NIASRA, Australia)

**Details**

Spatio-temporal log-Gaussian Cox process models play a central role in the analysis of spatially distributed systems in several disciplines. Yet, scalable inference remains computationally challenging both due to the high resolution modelling generally required and the analytically intractable likelihood function. In this talk I will demonstrate a novel way for solving this problem, which involves combining ideas from variational Bayes, message passing on factor graphs, expectation propagation, and sparse-matrix optimisation. The proposed algorithm is seen to scale well with the state dimension and the length of the temporal horizon with moderate loss in distributional accuracy. It hence provides a flexible and faster alternative to both non-linear filtering-smoothing type algorithms and approaches that implement the Laplace method (such as INLA) on (block) sparse latent Gaussian models. I demonstrate its implementation on simulation studies point-process observations, and use it to describe micro-dynamics in armed conflict in Afghanistan using data from the WikiLeaks Afghan War Diary. This work was done in collaboration with Botond Cseke (Microsoft Research), Guido Sanguinetti (University of Edinburgh), and Tom Heskes (University of Nijmegen). Bio: Andrew Zammit Mangion is a Senior Research Fellow at the National Institute for Applied Statistics Research Australia (NIASRA) at the University of Wollongong, Australia. His research focuses on spatial and spatio-temporal modelling of environmental phenomena, and the computational tools that enable it. He has recently co-authored a book on spatio-temporal modelling (https://www.crcpress.com/Spatio-Temporal-Statistics-with-R/Wikle-Zammit-Mangion-Cressie/p/book/9781138711136), and in 2017 was awarded a Discovery Early Career Research Award by the Australian Research Council.