## Groupe de travail des thésards du LPSM

#### Jour, heure et lieu

Le Lundi à 17:00, Jussieu, Salle Paul Lévy, 16-26 209

#### Contact(s)

gtt [AT] lpsm.paris

#### Liste de diffusion du GTT

Si vous souhaitez recevoir toutes les informations sur les événements du GTT, n'hésitez pas à vous inscrire à la liste de diffusion. Il suffit d'ajouter votre nom et votre email à l'adresse suivante:

### Séances passées

#### Année 2022

Groupe de travail des thésards du LPSM

Lundi 21 mars 2022, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Kimia Nadjahi + Jean-David Jacques** (LPSM) *Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections (Kimia NADJAHI) + B-series over the set of compactly supported multi-indices, (S)PDEs, and pre-Lie algebras (Jean-David JACQUES)*

The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on the problem of image generation.

B-series over the set of compactly supported multi-indices, (S)PDEs, and pre-Lie algebras (Jean-David JACQUES)

In this talk, i will give a short overview about some recent developments in in the field of renormalization of SPDEs via regularity structures. To that aim, i will introduce the category of pre-Lie algebras and give few concrete examples.

Organisateurs : Ludovic Arnould, Jérôme Carrand, Lucas Ducrot, Robin Khanfir, Ariane Marandon-Carlhian, Antonio Ocello

Groupe de travail des thésards du LPSM

Lundi 7 mars 2022, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Pierre Le Bris + Marine Demangeot** (LPSM) *Uniform in time propagation of chaos for the generalized Dyson Brownian motion and 1D Riesz gases (Pierre LE BRIS) + Continuous simulation of storm processes (Marine DEMANGEOT)*

We consider the case of one dimensional N-particle system in mean field interaction, with a singular and repulsive interaction, and we wish to understand the limit, as N goes to infinity, of the empirical measure of the system. After an introduction containing the motivation and some “usual” methods used to tackle this sort of problem, we describe a rather short proof that only relies on the well posedness of the SDE, and that in particular requires no study of the non linear limit PDE.

Continuous simulation of storm processes (Marine DEMANGEOT)

Spatial extreme value theory helps model and predict the frequency of extreme events in a spatial context like, for instance, extreme precipitations, extreme temperatures or high concentrations of pollution in the air. In this presentation, we focus on storm processes, which constitute prototype models for spatial extremes. They are classically simulated on a finite number of points within a given domain. We propose a new algorithm that allows to perform such a task everywhere, not just anywhere, in continuous domains like hyperrectangles or balls, in arbitrary dimension. This consists in generating basic ingredients that can subsequently be used to assign a value at any and every point of the simulation field. Therefore, the resolution of a single simulation can be refined indefinitely; this is particularly appropriate to investigate the geometrical properties of storm processes. Particular attention is paid to efficiency: by introducing and exploiting the notion of domain of influence of each storm, the running time is considerably reduced. Besides, most parts of the algorithm are designed to be parallelizable.

Organisateurs : Ludovic Arnould, Jérôme Carrand, Lucas Ducrot, Robin Khanfir, Ariane Marandon-Carlhian, Antonio Ocello

Groupe de travail des thésards du LPSM

Lundi 21 février 2022, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Matthieu Dolbeault + Camila Fernandez** (LPSM) *Random sampling for weighted least-squares approximation (Matthieu DOLBEAULT) + Time-to-Event Analysis (Camila FERNANDEZ)*

We investigate the problem of approximating a function in L^2 with a polynomial of degree N, using only evaluations at M chosen points, with M of the order of N. A first approach, based on weighted least-squares at i.i.d random points, provides a near-best approximation thanks to a matrix concentration inequality, but requires M of order N log(N). To reduce the sample while preserving the quality of approximation, we will need a recent result on sums of rank-one matrices, which answers to a conjecture from quantum physics dating back to 1959.

Time-to-Event Analysis (Camila FERNANDEZ)

Time-to-event analysis is a branch of statistics that has increased its popularity during the last decades due to its many different application fields such as predictive maintenance, customer churn prediction, population's lifetime estimation, etc. In this presentation, we review and compare the performance of well used prediction methods for time-to-event analysis. These consist of semi-parametric and parametric statistical models and machine learning approaches. The comparison is carried out on three different datasets and using two different scores (the integrated Brier score and concordance index). Moreover, we show how aggregation processes, which surprisingly have not yet been much studied in time-to-event analysis, can improve the prediction accuracy. Finally, we present simulation results to complete the comparison between the two scores while varying the number of samples and the censored data percentage to show their impact.

Organisateurs : Ludovic Arnould, Jérôme Carrand, Lucas Ducrot, Robin Khanfir, Ariane Marandon-Carlhian, Antonio Ocello

Groupe de travail des thésards du LPSM

Lundi 7 février 2022, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Emilien Bodiot + Luca Brusa** (LPSM) *Two-dimensional Gaussian Markov Processes, an operadic approach (Emilien BODIOT) + A tempered Expectation-Maximization algorithm for discrete latent variable models (Luca BRUSA)*

When working with one-dimensional Markov Processes, right and left eigenvectors of the transfer matrix provide a good notion of invariant boundary conditions. Unfortunately, from what we can tell, such a tool does not exist for two-dimensional Markov Processes. Recent work by Damien Simon fills this gap using a higher algebraic approach based on the theory of Operads. In this talk, we will present the algebraic objects that arise when considering this new formalism and how they define invariant boundary conditions. We will discuss the case of Gaussian Markov processes.

A tempered Expectation-Maximization algorithm for discrete latent variable models (Luca BRUSA)

Despite maximum likelihood estimation of many discrete latent variable models can be performed using the Expectation Maximization (EM) algorithm, a well-known drawback of this estimation method is related to the multimodality of the log-likelihood function. The consequence is that the estimation algorithm could converge to one of the local maxima, not corresponding to the global maximum. We propose a Tempered EM (T-EM) algorithm, which is able to explore the parameter space adequately. It consists in rescaling the objective function depending on a parameter known as temperature, which controls global and local maxima prominence. By properly tuning the sequence of temperature values, the target function is gradually attracted toward the global maximum, escaping local sub-optimal solutions. We rely on an accurate Monte Carlo simulation study to compare the proposal with the standard EM algorithm, evaluating both the ability to hit the global maximum and the computational time of the proposed algorithm. We conclude that the proposal outperforms the standard EM algorithm, improving the chance to reach the global maximum in the overwhelming majority of considered cases.

Groupe de travail des thésards du LPSM

Lundi 24 janvier 2022, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Nicolas Bouchot + Eddy Ella Mintsa** (LPSM) *Self-attracting polymer in homogeneous and random environment (Nicolas BOUCHOT) + Plug-in classification procedures for diffusion paths (Eddy ELLA MINTSA)*

A polymer can be modelled as a random walk (RW) whose trajectories are pondered by a density e^(H) where H is an Hamiltonian. Studying the partition function (expectation of e^(H) under the RW law) is a way to obtain information about typical configurations for the polymer.

In this talk, I will detail the study of an Hamiltonian that penalises trajectories whose range (set of visited sites) is large (hence self attracting), leading to a competition between the diffusivity of the RW and this “self-attraction”. I will prove scaling limits for the fluctuations around an optimal range size and for the centre of the range. Afterwards, I will consider the polymer to be in a Gaussian environment and will reward trajectories visiting sites with greater associated Gaussian variables. I will prove scaling limits in probability for the extrema of the range and fluctuations around these optimal positions, holding for almost all realisations of the environment.

Plug-in classification procedures for diffusion paths (Eddy ELLA MINTSA)

Recent advents in modern technology have generated labeled data, recorded at high frequency, that can be modelled as functional data. This work focuses on multiclass classification problem for functional data modelled by a stochastic differential equation. The drift function depends on the label of the class Y ∈ {1, …, K }, K ∈ N \ {0, 1}. An observation is a solution X = (Xt )t ∈[0,1] of the following time-homogeneous stochastic differential equation d X_t = b_Y ( X_t ) d t + σ ( X_t ) d W_t, x_0 = 0 , t ∈ [ 0 , 1 ] , with unknown drift functions (bk)k=1,…,K and unknown diffusion coefficient σ . Furthermore, we assume that the law p=(p1,…,pK) of the label Y is unknown. From a learning sample DN = X ,Yi i∈[1,N] that consists of N independent copies of the pair (X,Y), our aim is to build an implementable plug-in nonparametric classification procedure and derive upper bounds of its excess risk over Hölder spaces.

Few works have investigated the classification of functional data in the stochastic differential equation framework. In [3], it is done from a parametric point of view with a known diffusion coefficient. Here, we deal with a more challenging problem : the drift function is nonparametric, plus, the diffusion coefficient and the distribution p = (p1, …, pK ) are unknown.

Our classification procedure relies on the nonparametric estimation of functions bk, k = 1, …, K and σ^2 minimizing a least-squares contrast over a spline basis (as in [1] for the estimation of the drift function). We establish the consistency of the resulting empirical classifier as a function of N, the size of the learning sample and n ∈ N∗, the number of discrete observations for each path. We obtain rates of convergence under mild assumptions. These computations rely here on stochastic calculus, in particular fine estimate of the transition densities. Finally, the obtained empirical classifier is implemented and successfully evaluated from simulated data.

Groupe de travail des thésards du LPSM

Lundi 10 janvier 2022, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Arthur Blanc-Renaudie + Félix Cheysson** (LPSM) *Stick breaking constructions: study of D-trees and ICRT (Arthur BLANC-RENAUDIE) + Evolution of groups at risk of death from Covid-19 using hospital data (Félix CHEYSSON)*

A tree is a trunk with large branches, small branches, even smaller branches… and (outside of winter) a few leaves. I will present two algorithms, based on this principle, that have theoretical applications to additive coalescent, Galton–Watson trees, and the configuration model. I will also explain several methods to study those algorithms, which I use to prove scaling limits, bound the heights, prove compactness, and compute some fractal dimensions.

Evolution of groups at risk of death from Covid-19 using hospital data (Félix CHEYSSON)

The Health Data Warehouse (Entrepôt de Santé, EDS) of the AP-HP (Assistance Publique - Hôpitaux de Paris) is collecting and enriching information related to the hospitalisation of patients for whom a positive diagnosis of Covid-19 has been established (via PCR analysis or a lung x-ray). In order to improve patient management, it is essential to identify the risk factors of the disease and to determine whether they evolve with the different waves of the pandemic. In this context, we focus on the estimation of death rates for the groups at higher risk of death from Covid-19, using binary classification trees built from the CART algorithm (Breiman et al., 1984).

To be able to study the temporal evolution of death rates amongst these groups and thus adapt their healthcare, we propose a hypothesis test to compare CART trees and detect changes in the death rates. We show that a bootstrap version of this test holds good empirical properties, and illustrate it with numerical experiments and an application to the first wave of the pandemic. Finally, we present some theoretical insight into the distributional properties of the test statistic for our proposed hypothesis test, using results derived from the theory of U-statistics.

#### Année 2021

Groupe de travail des thésards du LPSM

Mardi 14 décembre 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Thibault Randrianarisoa + Alexandra Lefebvre** (LPSM) *Optional Pólya trees: posterior rates and uncertainty quantification (Thibault Randrianarisoa) + Exact inference in probabilistic graphical models and extensions over polynomials. Application to genetics and segmentation (Alexandra Lefebvre)*

We consider statistical inference in the density estimation model using a tree-based Bayesian approach, with Optional Pólya trees as prior distribution. We derive near-optimal convergence rates for corresponding posterior distributions with respect to the supremum norm. For broad classes of Hölder-smooth densities, we show that the method automatically adapts to the unknown Hölder regularity parameter. We consider the question of uncertainty quantification by providing mathematical guarantees for credible sets from the obtained posterior distributions, leading to near-optimal uncertainty quantification for the density function, as well as related functionals such as the cumulative distribution function. The results are illustrated through a brief simulation study.

Exact inference in probabilistic graphical models and extensions over polynomials. Application to genetics and segmentation (Alexandra Lefebvre)

Probabilistic graphical models play a central role for reasoning in complex systems involving latent variables. They provide a graphical representation of the dependency structure in a joint distribution. In this talk, we start with an introduction to the sum-product algorithm which exploites the dependency structure to reduce the complexity in time for an exact inference from exponential in the number of variables to exponential in the treewidth of a triangulation of the graph. Such computations are thus rendered tractable in graphs of reasonable treewidth. We will pursue with extensions of the algorithm over polynomial quantities with two illustrations: 1) computing the derivatives of the likelihood in parametric Bayesian networks; 2) relaxing the constraint on the prior distribution of the number of segments in sequence segmentation.

Groupe de travail des thésards du LPSM

Mardi 30 novembre 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Iqraa Meah** (LPSM) *Online multiple testing with super-uniformity reward (Iqraa Meah)*

Valid online inference is an important problem in contemporary multiple testing research, to which various solutions have been proposed recently. It is well-known that these methods can suffer from a significant loss of power if the null p-values are conservative. This occurs frequently, for instance whenever discrete tests are performed. To reduce conservatism, we introduce the method of super-uniformity reward (SURE). This approach works by incorporating information about the individual null cumulative distribution functions (or upper bounds of them), which we assume to be available. Our approach yields several new “rewarded” procedures that theoretically control online error criteria based either on the family-wise error rate (FWER) or the marginal false discovery rate (mFDR). We prove that the rewarded procedures uniformly improve upon the non-rewarded ones, and illustrate their performance for simulated and real data.

Groupe de travail des thésards du LPSM

Lundi 22 novembre 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Antoine Heranval + Lucas Broux** (LPSM) *Application of Generalized Pareto Regression Trees to the cost prediction of floods in France (Antoine HERANVAL) + The Sewing Lemma for all gamma > 0 (Lucas BROUX)*

In this work we use Generalized Pareto Regression Trees the cost prediction of floods in France on a real dataset. The aim of this study is to improve the cost prediction of an event of floods, shortly after its occurrence, for the entire French market. Indeed, following a natural catastrophe, it can be difficult to evaluate the scale and the cost of an event. In order to do that we use GPD Regression Trees to have a special focus on the extreme events and to gain further insight on the heterogeneity of the severity of these events. Thanks to a partnership with the French Federation of Insurance (FFA), essentially with one of his dedicated technical body, the association of French insurance undertaking for natural risk knowledge and reduction (Mission Risques Naturels, MRN), we have access to a large volume of events. These events represent all the events of floods that have been acknowledged as in state of natural catastrophe in France for the past 20 years. The cost of these events comes from the claims reported by the insurance company.

The Sewing Lemma for all gamma > 0 (Lucas BROUX)

Introduced by Massimiliano Gubinelli in 2004, the Sewing Lemma is a fundamental result in the theory of Rough Paths.

In this talk, I will present this lemma in its original form, along with a recently obtained version in the regime 0 < gamma ⇐ 1. I will also briefly introduce the theory of Rough Paths and explain how this new version of the Sewing Lemma can provide an elegant solution to the so-called “extension problem” of extending any Hölder path to a Rough Path.

Groupe de travail des thésards du LPSM

Lundi 15 novembre 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Pierre Marion + Sergi Burniol Clotet** (LPSM) *Framing RNN as a kernel method : A neural ODE approach (Pierre MARION) + Horospheres and horocyclic flows in nonpositive curvature (Sergi BURNIOL CLOTET)*

We study the behavior of a class of neural networks called recurrent neural network (RNN). Building on their interpretation as a discretization of a continuous-time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input, known as the signature. This connection allows us to frame a RNN as a linear method in a suitable Hilbert space. As a consequence, we obtain theoretical guarantees of generalization and stability. Our results are illustrated on simulated datasets.

Horospheres and horocyclic flows in nonpositive curvature (Sergi BURNIOL CLOTET)

I will present geodesic flows on rank 1 nonpositively curved manifolds, which are examples of weakly hyperbolic systems of geometric origin. Then I will explain two results obtained in my thesis which generalize what was known in the strong hyperbolic case. The first is the equidistribution of horospheres under the action of the geodesic flow, and the second is the unique ergodicity of the horocyclic flow of a certain class of surfaces.

Groupe de travail des thésards du LPSM

Lundi 18 octobre 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Miguel Martinez Herrera + Loïc Bethencourt** (LPSM) *Inference and tests for Multivariate Hawkes processes with Inhibition, application to neuroscience and genomics (Miguel MARTINEZ HERRERA) + Stable limit theorems for additive functionals of one-dimensional diffusion processes (Loïc BETHENCOURT)*

Since its origin for the study of earthquakes, the Hawkes process models interactions interactions where the occurrence of an event has a direct impact on the phenomenon itself. The well-studied “excitation” increases the event rate each time we observe a point whereas the “inhibition”, which is the main interest of this subject, considers the opposite effect. The main goal is to obtain a parametric estimation method for multidimensional processes, for which it is important to consider both probabilistic and statistical approaches. Such methods would provide a better understanding of the inner workings of interacting phenomena, such as neural activity networks.

Stable limit theorems for additive functionals of one-dimensional diffusion processes (Loïc BETHENCOURT)

We consider a positive recurrent one-dimensional diffusion process with continuous coefficients and we establish stable central limit theorems for a certain type of additive functionals of this diffusion. In other words we find some explicit conditions on the additive functional so that its fluctuations behave like some alpha-stable process in large time for alpha in(0,2].

Groupe de travail des thésards du LPSM

Lundi 14 juin 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Yoan Tardy + Francesco Bonacina** (LPSM) *Collisions of the supercritical Keller-Segel particle system (Yoan Tardy) + Influenza Decline During COVID-19 Pandemic: a Global Analysis Leveraging Classification and Regression Trees (Francesco Bonacina)*

We study a particle system naturally associated to the 2-dimensional Keller-Segel equation. It consists of N Brownian particles in the plane, interacting through a binary attraction in 1/r, where r stands for the distance between two particles. We will discuss about the two cases : the subcritical and the supercritical cases which correspond to the factor of attractivity less and greater than 2. In particular, we will see first that in the subcritical case there are only collisions between pair of particules which are not an issue to define properly the solution in the classical sense, and secondly that in the supercritical case there is an explosion due to a collision between several particles that we will study precisely.

Influenza Decline During COVID-19 Pandemic: a Global Analysis Leveraging Classification and Regression Trees (Francesco Bonacina)

The COVID-19 pandemic has caused a profound shock on the ecology of infectious diseases, in particular some studies highlighted that the circulation of influenza dramatically reduced in specific countries after the COVID-19 emergence. Also, they pointed out that the phenomenon could be associated with the non-pharmaceutical interventions (NPIs) applied by governments to control the pandemic. Here we address the problem at the global scale analyzing the FluNet influenza public repository for the periods before (2015-19) and during (2020-21) COVID-19 pandemic. Firstly, we map the space-time variation of influenza and we find that the percentage of positive tests decreased globally by 98.6%, but showing very heterogeneous patterns across countries and seasons. Then, we use Random Forests and Classification And Regression Trees to link the variation of influenza incidence with several covariates such as COVID-19 incidence, strictness of NPIs, change in human mobility, demography, season and geographical region.

Organisateurs : Emilien Bodiot, Lucas Broux, Gloria Buritica, David Lee, Thibault Randrianarisoa, Yoan Tardy

Groupe de travail des thésards du LPSM

Lundi 31 mai 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Sébastien Farkas + Toyomou Matsuda** (LPSM) *Regression trees algorithms tailored for Generalized Pareto Distribution estimation and application to Cyber risk quantification (Sébastien Farkas) + Parabolic Anderson models with singular potentials (Toyomou Matsuda)*

With the rise of the cyber insurance market, there is a need for better quantification of the economic impact of this risk and its rapid evolution. We particularly focus on severe/extreme claims, by combining a Generalized Pareto modeling – legitimate from Extreme Value Theory – and a regression tree approach. We will introduce the methodology, discuss some hypothesis, comment simulations results and explore some perspectives.

Parabolic Anderson models with singular potentials (Toyomou Matsuda)

The aim of this talk is twofold. Firstly, we will review some important facts about the parabolic Anderson model (PAM). The PAM is a heat equation with a random potential and it exhibits interesting properties such as intermittency. We will also review Schrödinger operators with random potentials, called Anderson Hamiltonians, focusing on their connections with PAMs.

Secondly, we will discuss PAMs and Anderson Hamiltonians with singular random potentials. In particular, we will discuss my ongoing work with Willem van Zuijlen to construct Anderson Hamiltonians with such singular potentials and to study asymptotics of the corresponding PAMs.

Organisateurs : Emilien Bodiot, Lucas Broux, Gloria Buritica, David Lee, Thibault Randrianarisoa, Yoan Tardy

Groupe de travail des thésards du LPSM

Lundi 17 mai 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Carlo Bellingeri** (TU Berlin) *From Ordinary Differential Equations to Rough Differential Equations (Carlo Bellingeri)*

Starting from deterministic concepts of ordinary differential equations, we will introduce the concepts of rough differential equation “à la Davie” and geometric rough paths. These notions allow us to formulate a stochastic differential equation without having to resort to Itô's calculus and have served as an inspiration for the study of much more sophisticated equations in recent years. In conclusion, we will discuss possible extensions of these concepts to non-commutative probabilities.

Organisateurs : Emilien Bodiot, Lucas Broux, Gloria Buritica, David Lee, Thibault Randrianarisoa, Yoan Tardy

Groupe de travail des thésards du LPSM

Lundi 3 mai 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Henri Elad Altman + Pierre Marion** (Imperial College London / LPSM) *Scaling limits of additive functionals of mixed fractional Brownian motions (Henri Elad Altman) + “Who plays Gandalf in LOTR?” - Natural Language Processing on structured data (Pierre Marion)*

In this talk we will present new methods to analyse the long-time behaviour of additive functionals of stochastic processes. As an application, we obtain scaling limit results for additive functionals of mixed fractional Brownian motions. This is based on joint work with Khoa Lê (TU Berlin).

“Who plays Gandalf in LOTR?” - Natural Language Processing on structured data (Pierre Marion)

Most recent approaches to language tasks (translation, text generation, etc.), model text as a stream of tokens, and learn a meaningful representation of these tokens via mechanisms like attention. However, some tasks require handling structured data, such as graphs or tables. One such example is knowledge graphs, which are a common way to encode real-world facts. The largest public knowledge graph is Wikidata (90M nodes, 1.2B edges). Finding information in these graphs, for instance to answer questions phrased in natural language, is a challenging task, which benefits from having a more structured approach to language modeling (semantic parsing, context embedding). The talk will give an overview of these different notions, and present recent results on the task of question answering grounded in Wikidata.

Groupe de travail des thésards du LPSM

Lundi 19 avril 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**William Da Silva + Aude Sportisse** (LPSM) *Hamburgers, cheeseburgers and Brownian excursions (William Da Silva) + Debiasing Averaged Stochastic Gradient Descent to handle missing values (Aude Sportisse)*

We give an elementary introduction to a bijection, due to Scott Sheffield, between loop-decorated planar maps and hamburger-cheeseburger inventory trajectories. Going through the looking glass, one may use these burger walks to study the convergence of loop-decorated random planar map models (this is the so-called peanosphere topology). In the scaling limit, the hamburger and cheeseburger walks encoding a particular model of random planar maps reveals intriguing connections between planar maps and Brownian excursions, which are reminiscent of the mating-of-trees story for space-filling explorations of Liouville quantum gravity surfaces. If time allows, I will describe a branching process arising when slicing half-planar Brownian excursions at heights.

Debiasing Averaged Stochastic Gradient Descent to handle missing values (Aude Sportisse)

Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning. However, a major caveat of large data is their incompleteness. We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion. In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of O(1/n) at the iteration n, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.

Groupe de travail des thésards du LPSM

Lundi 22 mars 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Joseph De Vilmarest + Helena Kremp** ((LPSM + Freie Universität Berlin) *Adaptive Forecasting by Kalman Filter, Application to Electricity Consumption During Lockdown (Joseph de Vilmarest) + A weak solution concept for singular SDEs (Helena Kremp)*

As the electricity storage capacities are still negligible compared to the need, the electricity production must always balance the consumption thus forecasting the load is a crucial task. During the coronavirus crisis, the electricity consumption patterns of most countries changed abruptly and the average load dropped. It highlights the need for adaptive forecasting methods, taking into account the recent observations in an online manner. We consider here generalized additive models which have demonstrated their efficiency to predict the electricity load, and we present an adaptive version based on Kalman Filtering. We apply the proposed method on the French electricity load and we test it before, during and after the lockdown of March 2020.

A weak solution concept for singular SDEs (Helena Kremp)

Since the works by Delarue, Diel (2016) and Cannizzaro, Chouk (2018) (in the Brownian noise setting), and our previous work (in the Lévy noise case), the existence and uniqueness of solutions to the martingale problem associated to multidimensional SDEs with additive \alpha-stable Lévy noise for \alpha in (1,2] and rough Besov drift of regularity \beta > (2-2\alpha)/3 is known. Motivated by the equivalence of probabilistic weak solutions to SDEs with bounded, measurable drift and solutions to the martingale problem, we define a (non-canonical) weak solution concept for singular Lévy diffusions, proving moreover equivalence to martingale solutions in both the Young \beta > (1-\alpha)/2, as well as in the rough regime \beta>(2-2\alpha)/3. This turns out to be highly non-trivial in the rough case and forces us to define certain rough stochastic sewing integrals involved. In particular, we show that the canonical weak solution concept (introduced also by Athreya, Butkovski, Mytnik (2018) in the Young case), which is well-posed in the Young case, yields non-uniqueness of solutions in the rough case.

This is ongoing work together with Nicolas Perkowski.

Groupe de travail des thésards du LPSM

Lundi 8 mars 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**David Lee** (LPSM) *A Functional Calculus via the Extension technique*

In the pioneering work of [Caffarelli and Silvestre,Comm. Par.Diff. Equ. 32(2007)] the following problem was raised: Which type of linear operators can be realized by the Dirichlet-to-Neumann operator for a Dirichlet problem on the halfspace? Even, though the above is an analysis problem the tools of excursion theory become very useful in this situation.

In this talk, I will present a probabilistic solution to the above problem. This work was done jointly with Daniel Hauer, from the University of Sydney.

Groupe de travail des thésards du LPSM

Lundi 22 février 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Aaraona Rakotoarivony + Ludovic Arnould** (LPSM) *Modigliani Miller Theorem, a stochastic control approach (Aaraona Rakotoarivony) + Analyzing the tree-layer structure of Deep Forests (Ludovic Arnould)*

In their seminal paper, Modigliani and Miller showed that in a perfect market the value of a company is independent of its capital structure or its dividend policy. In this talk, we are going to investigate how this result is modified when we introduce friction in the capital market. We adopt a stochastic control approach in which the manager of a company is able to act on the cash process through a dividend, capital, or debt emission.

Analyzing the tree-layer structure of Deep Forests (Ludovic Arnould)

Random forests on the one hand, and neural networks on the other hand, have met great success in the machine learning community for their predictive performance. Combinations of both have been proposed in the literature, notably leading to the so-called deep forests (DF). We investigate the mechanisms at work in DF and outline that DF architecture can generally be simplified into more simple and computationally efficient shallow forests networks. Despite some instability, the latter may outperform standard predictive tree-based methods. In order to precisely quantify the improvement achieved by these light network configurations over standard tree learners, we theoretically study the performance of a shallow tree network made of two layers, each one composed of a single centered tree. We provide tight theoretical lower and upper bounds on its excess risk. These theoretical results show the interest of tree-network architectures for well-structured data provided that the first layer, acting as a data encoder, is rich enough.

Groupe de travail des thésards du LPSM

Lundi 1 février 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Eva Lawrence + Armand Bernou** (Sorbonne Université) *Entropy maximisation problems for multidimensional functional reconstruction (Eva Lawrence) + Asymptotic Behavior of Markov Processes: a Dive into the Subgeometric Case (Armand Bernou)*

In the framework of non-parametric reconstruction problems, we are interested in the reconstruction of a multidimensional function f over a compact set U and such that f satisfies a certain amount of very general integral constraints. We propose to solve this reconstruction problem by setting it in the frame of the γ-entropy maximisation problem under constraints.

That is, we define a convex function γof Rpto R+ with some good properties and we are interested in the maximisation under constraints of the quantity I_γ(f) = - ∫_U γ(f) dP.

We explain that this problem can be linked to another one that deals with signed measures F.

Such problems have been studied in the case of a single function or a single measure reconstruction.

We propose to study the more general case of the reconstruction of a function or a measure with values in Rp.

Asymptotic Behavior of Markov Processes: a Dive into the Subgeometric Case (Armand Bernou)

In this talk, I will quickly introduce the notions required to discuss the stability structure of Markov processes on general state space. Those generalise the usual ideas of recurrence, positive recurrence and aperiodicity from the countable state space theory. I will then present some methods, based on the existence of Lyapunov functionals for the stochastic generator, which allow one to obtain the convergence towards the invariant measure of the Markov process at a subgeometric rate. This case differs from the geometric one in several manners, and if time, I will shortly discuss the main differences.

Groupe de travail des thésards du LPSM

Lundi 4 janvier 2021, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Bo Ning** (LPSM) *Spike and slab Bayesian sparse principal component analysis (Bo Ning)*

Sparse principal component analysis (PCA) is a popular tool for dimension reduction of high-dimensional data. Recently, there have been some works on Bayesian sparse PCA. However, those studies are mostly theoretical. There is a lack of efficient computational algorithms that are available for practical use. In this talk, I will propose a new method for Bayesian sparse PCA. In addition to studying the posterior contraction rate of this method, I will introduce a PX-CAVI algorithm based on variational inference. The PX-CAVI algorithm applies the parameter expansion to the principal components to avoid dealing with the orthogonal constraints between eigenvectors directly. This algorithm is fast and accurate. Our simulation studies showed that the PX-CAVI algorithm outperforms the existing penalized methods and an EM algorithm that uses a continuous spike and slab prior. The R code of this algorithm is available online.

#### Année 2020

Groupe de travail des thésards du LPSM

Lundi 14 décembre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Jean-David Jacques** (LPSM) *Rota-Baxter algebras, Bohnenblust-Spitzer identity and application to probability (Jean-David Jacques)*

In 1955, Frank Spitzer studied some combinatoric relations involving the maximum of some truncations of a serie of real numbers. This combinatorial study led to a usefull relation for caracteristic functions of real random walk. Later, Glen Baxter obtained the same relation by introducing an operator over the space of caracteristic functions, the so called “Baxter operator”. Finaly, a complete algebraic study by the famous combinatorist Gian-Carlo Rota led to what we call nowaday “Rota-Baxter algebra”.

Groupe de travail des thésards du LPSM

Lundi 7 décembre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Joseph De Vilmarest** (LPSM) *Adaptive Forecasting by Kalman FIlter, Application to Electricity Consumption During Lockdown (Joseph de Vilmarest)*

As the electricity storage capacities are still negligible compared to the need, the electricity production must always balance the consumption thus forecasting the load is a crucial task. During the coronavirus crisis, the electricity consumption patterns of most countries changed abruptly and the average load dropped. It highlights the need for adaptive forecasting methods, taking into account the recent observations in an online manner. We consider here generalized additive models which have demonstrated their efficiency to predict the electricity load, and we present an adaptive version based on Kalman Filtering. We apply the proposed method on the French electricity load and we test it before, during and after the lockdown of March 2020.

Groupe de travail des thésards du LPSM

Lundi 30 novembre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Lucas Iziquel** (LPSM) *Tree-like random metric spaces seen as fixed points of distributional equations (Lucas Iziquel)*

When studying the scaling limits of some models of random trees or graphs, we obtain random compact metric spaces. From the study of the Continuum Random Tree (CRT), a classical example of such random spaces, we will use its self-similarity property - a well-chosen subspace of the CRT has the same distribution as a rescaled copy of the entire CRT itself - to see the CRT as a fixed-point of a particular equation. From there we will introduce a framework to study this kind of distributional fixed-point equations, the existence and uniqueness of the fixed-points, and the possible convergence towards these fixed-points. Moreover, some geometric properties of the fixed-point can be deduced directly from the equation, for instance its almost sure fractal dimensions.

Image: A realization of the 1.07-stable looptree from the website of Igor Kortchemski

Groupe de travail des thésards du LPSM

Lundi 23 novembre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Vasiliki Velona** (Universitat Pompeu Fabra and Universitat Politècnica de Catalunya / Visiting LPSM) *The broadcasting problem (Vasiliki Velona)*

Consider a large rooted tree, where the root-vertex of the tree has a random bit value assigned. Every other vertex has the same bit value as its parent with probability 1-q and the opposite value with probability q, where q\in [0,1]. The broadcasting problem consists in estimating the value of the root bit, upon observing the unlabelled tree graph and the bit values associated with a subset of the vertices. I will discuss such results on various tree classes and, in particular, on random recursive trees created either by the uniform attachment model or by the linear preferential attachment model.

Groupe de travail des thésards du LPSM

Lundi 16 novembre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Robin Khanfir** (LPSM) *The range of branching random walks on Galton-Watson trees (Robin Khanfir)*

The scaling limit of critical Galton-Watson trees conditionned to be large is an universal random compact metric space, called the Continuum Random Tree (CRT). For super-critical Galton-Watson trees, finding such scaling limit is not possible because they are infinite, and so not compact, with positive probability. One could overcome this difficulty by randomly choosing a subtree, with large fixed size, of an infinite Galton-Watson tree. To do so, we consider a random walk on the infinite tree then we look at the set of the vertices visited by the walk before time n, which we call the range or the trace of the walk. In this case, there is indeed a scaling limit that consists of several CRT glued together at their root. Then, an interesting question arises. What does happen if we replace the random walk by a branching random walk ? In other words, we try to index our walk by a critical Galton Watson tree conditioned to be large instead of a linear time.

Groupe de travail des thésards du LPSM

Lundi 9 novembre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Linus Bleistein** (ENS Ulm, LPSM) *Wasserstein-GANs and the Signature Transform (Linus Bleistein)*

Wasserstein-GANs (W-GANs), a class of generative adversarial models based on neural networks and the Wasserstein distance, have recently drawn a lot of attention because of their high generative power for images. However, little has been done regarding time series generation. We introduce the signature transform, a non-linear transformation that allows for simple representations of complex, high dimensional paths, and combine them with W-GANs in order to generate time series.

Groupe de travail des thésards du LPSM

Lundi 26 octobre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Nicklas Werge** (LPSM) *AdaVol: An Adaptive Recursive Volatility Prediction Method (Nicklas Werge)*

Quasi-Maximum Likelihood (QML) procedures are theoretically appealing and widely used for statistical inference. While there are extensive references on QML estimation in batch settings, the QML estimation in streaming settings has attracted little attention until recently. An investigation of the convergence properties of the QML procedure in a general conditionally heteroscedastic time series model is conducted, and the classical batch optimization routines extended to the framework of streaming and large-scale problems. An adaptive recursive estimation routine for GARCH models named AdaVol is presented. The AdaVol procedure relies on stochastic approximations combined with the technique of Variance Targeting Estimation (VTE). This recursive method has computationally efficient properties, while VTE alleviates some convergence difficulties encountered by the usual QML estimation due to a lack of convexity. Empirical results demonstrate a favorable trade-off between AdaVol’s stability and the ability to adapt to time-varying estimates for real-life data.

Groupe de travail des thésards du LPSM

Lundi 19 octobre 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Isao Sauzedde** (LPSM) *Covariant Young's integration (Isao Sauzedde)*

We will present a new point of view on the Young's integral which uses no approximation of the path. It will give us a slight extension of it, with the additional property of being stable by smooth deformations of the plane. We will then go from Young to stochastic integration, and finally define the integral of some irregular random 1-forms along Brownian motion. The main role will be played by the winding of curves around points.

Groupe de travail des thésards du LPSM

Lundi 6 juillet 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Gloria Buritica** (LPSM) *Clustering of extreme events (Gloria Buritica)*

The occurrence of an extreme event usually triggers a sequence of extreme events in a short period. In practice, this phenomenon enchains very negative consequences when the risk model does not account for the probability of time-clustering of extremes. For example, many floods occur after recording extreme rainfall data for consecutive days. Similarly, returns for stock prices crash for several days before returning to a usual dynamic. In the setting of regularly varying stationary time series, the extremal index is a parameter that fully describes in most cases the time-clustering of extremes for univariate time series. The multivariate setting inspires from these results and considers time-clusters as data from a portion of time with the supremum norm exceeding a high threshold. We study a generalisation of this notion considering time-blocks of data with norm Lp above a high threshold. This new definition allows for capturing extreme behavior from more time-blocks than before.

Organisateurs : F. Bechtold, W. Da Silva , A. Fermanian, S. Has, Y. Yu

Groupe de travail des thésards du LPSM

Mercredi 1 janvier 2020, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Anciens Orateurs** (LPSM) *Anciens GTT année 2020*

KMT coupling for random walk bridges - Xuan Wu (15 Juin 2020)

Smoothing of Bayesian forest estimators in density estimation - Thibault Randrianarisoa (8 Juin 2020)

An averaging (path-by-path) approach to regularisation by noise for ODEs - Lucio Galeati (1 Juin 2020)

High Regularity Invariant Measures in PDEs - Mickaël Latocca (25 Mai 2020)

Wasserstein Random Forests and Applications in Heterogeneous Treatment Effects - Qiming Du (18 Mai 2020)

An approach to analyze the tail of the distribution of heterogenous data and application to insurance - Sébastien Farkas (11 Mai 2020)

Pathwise Regularisation of McKean–Vlasov Equations - Avi Mayorcas (4 Mai 2020)

On a class of completely random measures (CRMs) and its role in Bayesian analysis - Riccardo Passeggeri (27 Avril 2020)

An introduction to statistical properties of expanding maps - Malo Jézéquel (20 Avril 2020)

Introduction on post hoc inference (online) - Marie Perrot-Dockès (6 Avril 2020)

Learning with signatures (online) - Adeline Fermanian (30 Mars 2020)

Introduction on post hoc inference (cancelled!) - Marie Perrot-Dockès (9 Mars 2020)

Towards a better understanding of Wasserstein GANs - Ugo Tanielian (24 Février 2020)

Quantum Computing and Applications to Machine Learning - Jonas Landman (10 Février 2020)

Polymer Pinning Model - Order of the phase transition in the inhomogeneous model - Alexandre Legrand (3 Février 2020)

A kernel-based consensual regression aggregation method - Sothea HAS (3 Février 2020)

A Bayesian hierarchical model for traffic flow and waiting time prediction in carpooling lines - Panayotis Papoutsis (27 Janvier 2020)

On weighted sampling without replacement - Othmane Safsafi (20 Janvier 2020)

Coupling methods for the convergence rate of Markov processes - Armand Bernou (13 Janvier 2020)

Change-point analysis of copula models - Karen A. Vásquez Vivas (6 Janvier 2020)

Organisateurs : F. Bechtold, W. Da Silva , A. Fermanian, S. Has, Y. Yu

#### Année 2019

Groupe de travail des thésards du LPSM

Mardi 1 janvier 2019, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Anciens Orateurs** (LPSM) *Anciens GTT année 2019*

Growth-fragmentation in Planar Brownian excursions - William Da Silva (9 Décembre 2019)

The Calderón Problem through the eyes of a probabilist - David Lee (2 Décembre 2019)

Statistical inference for a partially observed interacting system of Hawkes processes - Chenguang Liu (25 Novembre 2019)

Long time dynamics for interacting oscillators on graphs - Fabio Coppini (18 Novembre 2019)

Heat kernel on the infinite percolation cluster - Chenlin GU (4 Novembre 2019)

Modified Runge-Kutta methods for pathwise approximations of SDEs - F. Bechtold (28 Octobre 2019)

Informative missing data - Aude Sportisse (21 Octobre 2019)

Random lifts and cutoff phenomenon - Guillaume Conchon-Kerjan (14 Octobre 2019)

Les EDPS semi-linéaires à diffusion non-bornée - Florian Bechtold (24 Juin 2019)

Belief propagation in Bayesian networks and Markov chains and extensions with polynomials - Alexandra Lefebvre (17 Juin 2019)

Algèbres de quasi-battage, mouvement Brownien et chemins rugueux - Carlo Bellingeri (3 Juin 2019)

Chaînes d'oscillateurs et modèles à champ moyen - Alejandro Fernandez Montero (27 Mai 2019)

Construction séquentielle d'arbre couvrant minimal - Othmane SAFSAFI (20 Mai 2019)

A mathematical model on black market - Chenlin GU (13 Mai 2019)

Rough paths, signature and statistical learning - Adeline Fermanian (6 Mai 2019)

Annulation de la constante isopérimétrique ancrée de percolation en p_c - Barbara Dembin (15 Avril 2019)

On the diffusion of eigenvectors of random matrices - Lucas Benigni (8 Avril 2019)

Impact of tree choice in metagenomics differential abundance studies - Antoine Bichat (25 Mars 2019)

Quelques propriétés sur les ICRT - Arthur Blanc-Renaudie (18 Mars 2019)

Concentration inequalities for functions of independent random variables - Antoine Marchina (11 Mars 2019)

K-means algorithm with Bregman divergences and constructing predictive models based on this algorithm - Sothea Has (4 Mars 2019)

Penalized likelihood methods applied to age-period-cohort analysis - Vivien Goepp (25 Février 2019)

Optimal control and dynamic programming - Enzo Miller (18 Février 2019)

Universality for critical kinetically constrained models - Ivailo Hartarsky (11 Février 2019)

An introduction to Extreme Value Theory - Nicolas Meyer (4 Février 2019)

Windings of Brownian motion - Isao Sauzedde (27 Janvier 2019)

Modèle de Poland-Scheraga pour la dénaturation de l'ADN - Alexandre Legrand (21 Janvier 2019)

Spectral techniques in matrix completion - Simon Coste (14 Janvier 2019)

Introduction of high-dimensional interpretable machine learning models and their applications - Simon Bussy (7 Janvier 2019)

Organisateurs : A. Lefebvre, N. Meyer, O. Safsafi, T. Touati

#### Année 2018

Groupe de travail des thésards du LPSM

Lundi 1 janvier 2018, 17 heures, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Anciens Orateurs** (LPSM) *Anciens GTT année 2018*

The Hitchhiker’s Guide to the Galaxy of Financial Risk Management: Risk Measures and Procyclicality - Marcel Bräutigam (3 Décembre 2018)

Scaling limits of a random graph: critical configuration model with power-law degrees - Guillaume Conchon-Kerjan (26 Novembre 2018)

Temps d’infection dans le modèle de Duarte : le rôle des barrières d’énergie - Laure Marêché (19 Novembre 2018)

Numerical consistent estimates in the multivariate linear mixed-effects model and application to the malaria infection study - Eric Adjakossa (12 Novembre 2018)

How to free the boundary - Clément Cosco (5 Novembre 2018)

A brief introduction to Sequential Monte Carlo - Qiming Du (22 Octobre 2018)

Records of the Fractional Brownian Motion - Assaf Shapira (15 Octobre 2018)

Local convergence for random permutations: the case of uniform pattern-avoiding permutations - Jacopo Borga (27 Juin 2018)

Triangle, Étoile, Intégrabilité - Paul Melotti (11 Juin 2018)

Introduction à la théorie des jeux à champs moyens (ou Mean Field Games - MFG) - Ziad Kobeissi (4 Juin 2018)

Random obstacle problems, and integration by parts formulae for the laws of Bessel bridges - Henri Elad-Altman (28 Mai 2018)

Bayesian inference of causality in gene regulatory networks - Flaminia Zane (23 Mai 2018)

Régularité de l'exposant de Liapounov d'un produit de matrices aléatoires - Benjamin Havret (23 Avril 2018)

Loi asymptotique de l'estimateur des moindres carrés pour les modèles linéaires avec erreurs dépendantes - Emmanuel Caron (9 Avril 2018)

Un graphe aléatoire pour modéliser la spéciation - François Bienvenu (26 Mars 2018)

Théorèmes limites pour des fonctionnelles de clusters d'extrêmes de processus et champs aléatoires faiblement dépendants - José-Gregorio Gomez (20 Mars 2018)

Contrôle Stochastique et Apprentissage Statistique : exemple de la gestion middle-out d'un portefeuille - Alexis Bismuth (19 Mars 2018)

Arbres, marches et laminations aléatoires - Paul Thévenin (12 Mars 2018)

Cutoff of sparse Markov chains - Guillaume Conchon-Kerjan (5 Mars 2018)

Formule d'Itô avec les structures de régularité - Carlo Bellingeri (26 Février 2018)

Théorie de l’estimation par partition dépendante des données et Rules Induction Partitioning Estimator - Vincent Margot (20 Février 2018)

Cardinal minimal d'une surface de coupure dans une percolation de premier passage surcritique - Barbara Dembin (19 Février 2018)

Peignes et coalescents échangeables - Félix Foutel-Rodier (12 Février 2018)

Organisateurs : C. Cosco, S. Coste, L. Marêché, P. Melotti, N. Meyer, B. Dembin, G. Conchon-Kerjan, F. Coppini, O. Safsafi