## Groupe de travail des thésards du LPSM

#### Day, hour and place

Monday at 17:00, Jussieu, Salle Paul Lévy, 16-26 209

#### Contact(s)

gtt [AT] lpsm.paris

#### GTT mailing list

If you want to receive information about GTT events, subscribe to the mailing list. You just need to write your name and email address on the following link :

### Previous talks

#### Year 2023

Groupe de travail des thésards du LPSM

Tuesday November 21, 2023, 5:30PM, Sophie Germain - Salle 1016 (1er étage)

**Orphée Collin + Ibrahim Mérad** *A simple disordered system : The Random Field Ising Chain (O. Collin) + Robust Stochastic Optimization via Gradient Quantile Clipping (I. Mérad)*

The Ising Model is a classical model in statistical physics describing the behavior of ferromagnetic moments on a lattice interacting via a local interaction. When the lattice is one-dimensional and in the case of homogeneous nearest-neighbor interaction, the model is known to be exactly solvable (and simple). However, the disordered version of the one-dimensional Ising Model (called the Random Field Ising Chain), where the chain interacts with an i.i.d environment, is a much more challenging model. In particular, it exhibits a pseudo-phase transition as the strength Gamma of the inner-interaction goes to infinity. A description of the typical configurations when Gamma is large has been given in the physical literature in terms of a renormalisation group fixed point. We study the RFIC and give mathematical evidence that, in accordance with the physicists’ description, typical configurations are close to the configuration given by the process of Gamma-extrema of the Brownian Motion, when Gamma is large.

- Ibrahim Merad - Robust Stochastic Optimization via Gradient Quantile Clipping.

We introduce a clipping strategy for Stochastic Gradient Descent (SGD) which uses quantiles of the gradient norm as clipping thresholds. We prove that this new strategy provides a robust and efficient optimization algorithm for smooth objectives (convex or non-convex), that tolerates heavy-tailed samples (including infinite variance) and a fraction of outliers in the data stream akin to Huber contamination. Our mathematical analysis leverages the connection between constant step size SGD and Markov chains and handles the bias introduced by clipping in an original way. For strongly convex objectives, we prove that the iteration converges to a concentrated distribution and derive high probability bounds on the final estimation error. In the non-convex case, we prove that the limit distribution is localized on a neighborhood with low gradient. We propose an implementation of this algorithm using rolling quantiles which leads to a highly efficient optimization procedure with strong robustness properties, as confirmed by our numerical experiments.

Organisation : Nicolas Bouchot, Sonia Charabouska, Orphée Collin, Ali Ellouze, Romain Lacoste et Paul Liautaud

Groupe de travail des thésards du LPSM

Tuesday November 7, 2023, 5:30PM, Jussieu - salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Équipe Du Séminaire Doctoral** *Séance de rentrée du séminaire*

Ces explications seront suivies d'un exposé d'un des membres de l'équipe, puis d'un pot pendant lequel nous pourrons échanger entre doctorants

Organisation : Nicolas Bouchot, Sonia Charabouska, Orphée Collin, Ali Ellouze, Romain Lacoste et Paul Liautaud

Groupe de travail des thésards du LPSM

Monday June 26, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Patrick Oliveira Santos + Camila Fernandez** *Free Probability, Random Matrices and Classical Probability (P. Oliveira Santos) + Online Learning Approach for Survival Analysis (C. Fernandez)*

The Central Limit Theorem states that a normalized linear sum of i.i.d random variables converges in distribution to a standard Gaussian random variable. When the variables do not commute, however, such convergence fails to hold and a different limit may arise. This is the case of random matrices, for example. In this talk, we will describe a general framework called Free Probability and noncommutative probability spaces where we can work with noncommutative objects and their limits. If time permits, we will discuss some important examples, such as the adjacency matrix of regular random graphs and trees.

Camila FERNANDEZ - Online Learning Approach for Survival Analysis.

Time-to-event analysis is a branch of statistics that was first developed by medicine researchers with the objective to predict lifetime of populations and recurrence of diseases. This field has attracted attention from different communities due to its wide number of applications such as predictive maintenance, customer churn prediction and actuarial science. The biggest issue in time-to-event analysis, and what makes the difference with a classic regression problem, is the fact that a portion of the dataset didn’t reach the critical time during the observed period, and so we will have a type of missing data. This phenomenon is called right censorship and it is important to take it into account because most of the times the censored data represents at least a 50% of the dataset. This censored time will be a lower bound for the critical time. Finally, it is important to mention that using classical regression methods lead to a big bias in the results presenting very poor performance, so adapting the models, scoring rules and the mathematical framework is not an evident thing to do. In this presentation I will show an online mathematical setting that allows us to estimate the distribution of event times by a well known online learning algorithm, Online Newton Step (ONS). In addition, we present a discussion about how to choose the parameters of ONS and we propose an aggregation method that improves the choice of the parameters and guarantees good regret bounds.

Organisation : Alexis Ayme, Loïc Bethencourt, Nicolas Bouchot, Pierre Marion, Miguel Martinez Herrera et Antonio Ocello

Groupe de travail des thésards du LPSM

Monday June 12, 2023, 5:30PM, Jussieu salle Émile Borel (201) couloir 15-16 (2ème étage)

**Nathan De Carvalho + Iqraa Meah** *Reconciling rough volatility with jumps (N. De Carvalho) + Consistent False Discovery Proportion bounds (I. Meah)*

We reconcile rough volatility models and jump models using a class of reversionary Heston models with fast mean reversions and large vol-of-vols. Starting from hyper-rough Heston models with a Hurst index H in (−1/2,1/2), we derive a Markovian approximating class of one dimensional reversionary Heston-type models. Such proxies encode a trade-off between an exploding vol-of-vol and a fast mean-reversion speed controlled by a reversionary time-scale epsilon > 0 and an unconstrained parameter H in R. Sending epsilon to 0 yields convergence of the reversionary Heston model towards different explicit asymptotic regimes based on the value of the parameter H. In particular, for H smaller than −1/2, the reversionary Heston model converges to a class of Lévy jump processes of Normal Inverse Gaussian type. Numerical illustrations show that the reversionary Heston model is capable of generating at-the-money skews similar to the ones generated by rough, hyper-rough and jump models.

Iqraa MEAH - Consistent False Discovery Proportion bounds.

Type I error need to be carefully controlled in decision problem while allowing for the signal to be discovered. This paradigm is emphasized in multiple testing where the number of errors can escalate with the number of tests performed if the uncorrected type I error budget $\alpha$ is used. The False Discovery Proportion (FDP) is a significant type I error criterion, offering powerful multiple-testing procedures. Since FDP is a random quantity, classical multiple-testing procedures rather control the expectation of the FDP, the so-called False Discovery Rate (FDR), at level $\alpha$. FDR-controlling procedures, such as the popular BH procedure, are widely adopted by practitioners. Nonetheless, they restrict scientists from freely exploring the data, as they prescribe a rejection set for a fixed level of $\alpha$ (ad-hoc approach). This limitation prevents scientists from expanding the outputted rejection set using domain knowledge, or to use a data-dependent level $\hat{\alpha}$. To address this issue and allow for more flexibility, posthoc analysis provides probabilistic bounds on FDP simultaneously for any rejection subset. The quality of these bounds, particularly their tightness, can be improved for specific rejection subsets built by following a path indicated by an FDR-controlling procedure. This is done in the work of Katsevitch and Ramdas (2019) where the authors bridge the gap between FDR control and simultaneous FDP bounds. We propose to go further in this work by introducing the notion of consistency as a quality criterion for these bounds. Intuitively, consistent bounds should closely align with the control level $\alpha$ when applied to a rejection set generated by the $\alpha$-level FDR controlling procedure. We propose such consistent bounds for different testing settings and analyze the consistency with other existing bounds on simulated data.

Organisation : Alexis Ayme, Loïc Béthencourt, Nicolas Bouchot, Pierre Marion, Miguel Martinez Herrera et Antonio Ocello

Groupe de travail des thésards du LPSM

Tuesday May 30, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Lionel Sogpoui + Alice L'Huillier** *Propagation of carbon tax in credit portfolio through macroeconomic factors (L. Sogpoui) + Semiparametric inference using fractional posteriors (A. L'Huillier)*

We study how the introduction of carbon taxes in a closed economy propagate in a credit portfolio and precisely describe how carbon taxes dynamics affect the firm value and credit risk measures such as probability of default, expected and unexpected losses. We adapt a stochastic multisectoral model to take into account carbon taxes on both sectoral firms' production and sectoral household's consumption. Carbon taxes are calibrated on carbon prices, provided by the NGFS transition scenarios, as well as on sectoral households' consumption and firms' production, together with their related greenhouse gases emissions. For each sector, this yields the sensitivity of firms' production and households' consumption to carbon taxes and the relationships between sectors. Our model allows us to analyze the short-term effects of carbon taxes as opposed to standard Integrated Assessment Models (such as REMIND), which are not only deterministic but also only capture long-term trends of climate transition policy. Finally, we use a Discounted Cash Flow methodology to compute firms' values which we then use in the Merton model to describe how the introduction of carbon taxes impacts credit risk measures. We obtain that the introduction of carbon taxes distorts the distribution of the firm’s value, increases banking fees charged to clients (materialized by the level of provisions computed from the expected loss), and reduces banks' profitability (translated by the value of the economic capital calculated from the unexpected loss). In addition, the randomness introduced in our model provides extra flexibility to take into account uncertainties on productivity and on the different transition scenarios by sector. We also compute the sensitivities of the credit risk measures with respect to changes in the carbon taxes, yielding further criteria for a more accurate assessment of climate transition risk in a credit portfolio. This work provides a preliminary methodology to calculate the evolution of credit risk measures of a multisectoral credit portfolio, starting from a given climate transition scenario described by a carbon price.

Alice L'HUILLIER - Semiparametric inference using fractional posteriors.

I will start by a gentle introduction to Bayesian Statistics or more precisely to what is sometimes called “the frequentist analysis of posterior distributions”. Then I will present a work about semiparametric inference using fractional posteriors. In this work, we establish a general Bernstein–von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. As part of our proofs, we also refine existing contraction rate results for fractional posteriors by sharpening the dependence of the rate on the fractional exponent. It is a joint work with Luke Travis, Ismaël Castillo and Kolyan Ray.

Organisation : Alexis Ayme, Loïc Béthencourt, Nicolas Bouchot, Pierre Marion, Miguel Martinez Herrera et Antonio Ocello

Groupe de travail des thésards du LPSM

Monday May 15, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Loïc Béthencourt + Nathan Doumèche** *Fractional diffusion limit for a kinetic Fokker-Planck equation with diffusive boundary conditions in the half-line (L. Béthencourt) + Convergence and error analysis of physics-informed neural networks (N. Doumèche)*

We will consider the following simple kinetic model: a particle which lives in $\mathbb{R}_+$, whose velocity is positive recurrent diffusion when the particle lives in $(0,\infty)$. When it hits the boundary $x=0$, the particle is restarted with a random strictly positive velocity. Regarding the model without reflection, it has been shown by Lebeau & Puel, and then Fournier & Tardif that the properly rescaled particle converges towards a stable process, of which the time-marginals satisfy the fractional heat equation. We will show that for the reflected process, the scaling limit is a stable process reflected on its infimum.

Nathan DOUMÈCHE - Convergence and error analysis of physics-informed neural networks

Physics-informed neural networks (PINNs) are a promising approach that combines the power of neural networks with the interpretability of physical modeling. PINNs have shown good practical performance in solving partial differential equations (PDEs) and in hybrid modeling scenarios, where physical models enhance data-driven approaches. However, it is essential to establish their theoretical properties in order to fully understand their capabilities and limitations. In this study, we highlight that classical training of PINNs can suffer from systematic overfitting. This problem can be addressed by adding a ridge regularization to the empirical risk, which ensures that the resulting estimator is risk-consistent for both linear and nonlinear PDE systems. However, the strong convergence of PINNs to a solution satisfying the physical constraints requires a more involved analysis using tools from functional analysis and calculus of variations. In particular, for linear PDE systems, an implementable Sobolev-type regularization allows to reconstruct a solution that not only achieves statistical accuracy but also maintains consistency with the underlying physics.

Organisation : Alexis Ayme, Loïc Bethencourt, Nicolas Bouchot, Pierre Marion, Miguel Martinez Herrera et Antonio Ocello

Groupe de travail des thésards du LPSM

Monday April 24, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Yueh-Sheng Hsu + Grâce Mayala** *Landau Hamiltonian perturbed by Gaussian white noise on full space R^2 (Y-S. Hsu) + Subsampled base learners in extreme regions (G. Mayala)*

Landau Hamiltonian with noise is a random Schrödinger operator describing quantum systems in the presence of a magnetic field and a random potential V, a notable example of which is the quantum Hall effect. Despite being physically relevant, the case when V being a Gaussian white noise is mathematically ill-posed due to the low regularity and unboundedness on R^2. In this talk, the aim is to define the object in question as a self-adjoint operator. We are going to present an idea of construction without resorting to sophisticated theories of singular SPDEs. By decomposing the white noise into a globally Hölder distribution and a real-valued function growing at most quadratically at infinity, the construction is undertaken in two steps: we first deal with the globally Hölder part with an exponential conjugation technique inspired by a previous work of Hairer and Labbé, and then tackle the unbounded part by a classical result of Faris and Lavine.

Subsampled base learners in extreme regions - Grâce Mayala

In this work, we focus on the importance sampling, which allows us to overcome class imbalance issues for binary classification. We first tackle the problem of sampling for the probability estimation questions, which we distinguish subsampling with or without replacement. we will also introduce some tools : Hàjek projection and Hoeffding decomposition that we will extensively use after. Then, as in Wager and al, we first exhibit the connection between random forests and U-statistics and finally we provide asymptotic normality properties in one and two samples cases using U-statistics and triangular arrays techniques.

Organisation : Alexis Ayme, Loïc Bethencourt, Nicolas Bouchot, Pierre Marion, Miguel Martinez Herrera et Antonio Ocello

Groupe de travail des thésards du LPSM

Monday April 17, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Lucas Journel + Gabriel Victorino-Cardoso** *Sampling of singular Gibbs measure (L. Journel) + Bias Reduced Self Normalised Importance sampling and applications (G. Victorino-Cardoso)*

Many physical systems can be described by a Hamiltonian of the form H = U + y^2/2. The goal of molecular dynamic is to study such system though the computation of average of some function f with respect to the Gibbs measure associated to H. One method among many is the kinetic Monte Carlo algorithm. In this talk I will present weak convergence results for numerical schemes of the Langevin process in the case of a singular potential U.

Gabriel VICTORINO-CARDOSO - Bias Reduced Self Normalised Importance sampling and applications

Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. In many applications, the target distribution is known only up to a normalization constant, in which case self-normalized IS (SNIS ) can be used. This is notably the case for posterior sampling in most bayesian models. While the use of self-normalization can have a positive effect on the dispersion of the estimator, it introduces bias. In this talk, I'll present a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without a significant increasing in the variance. This method is a wrapper in the sense that it uses the same proposal samples and importance weights as SNIS , but makes clever use of iterated sampling–importance resampling (i-SIR) to form a bias-reduced version of the estimator.

Groupe de travail des thésards du LPSM

Monday March 27, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Martin Chak + Francesco Bonacina** *Optimal friction for underdamped Langevin sampling (M. Chak) + Forecasting the coupled dynamics of influenza strains (F. Bonacina)*

An adaptive method is given for underdamped Langevin dynamics to minimize asymptotic variance with respect to the friction parameter. The procedure is based on a formula for the gradient of the variance in terms of solutions to the Poisson equation. Reduced variance is demonstrated on toy and Bayesian inference problems.

Francesco BONACINA - Forecasting the coupled dynamics of influenza strains

Three main strains of Influenza (A\H1N1, A\H3N2 and B) co-circulate around the world, resulting in irregular incidence dynamics due to the concurrence of multiple factors: contagion paths are strongly stochastic, population susceptibility changes over time modified by previous infections, weather forcings intervene locally and human behaviors also impact the spread of viruses.

Here, for several countries-years we consider the relative abundances of the 3 strains (%A\H1N1, %A\H3N2, %B) and we study their coupled dynamics.

Data points are 3-dimensional vectors living on a Simplex, and classical statistics is not appropriate to analyze this type of data. Thus, in a first pre-processing step, we apply log-ratio transformation (from Compositional Data Analysis) to map these points into a nice Euclidean space (R²). Once this is done, it is easy to perform descriptive analyses to identify significant epidemiological phenomena: we recover the major epidemics of the last 20 years and we detect groups of countries with similar dynamics. Later, we perform time-series forecasting in order to predict the dominant strain of the next year in a given country. We test several methods: from naive estimators to more sophisticated Bayesian Hierarchical Vector Autoregressive models, which can make up for the lack of data over time by using information from similar countries.

Groupe de travail des thésards du LPSM

Monday March 13, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Songbo Wang + David Lee** *Uniform-in-time propagation of chaos for mean-field Langevin dynamics (S. Wang) + Non-linear potential theory and Tug of War games (D. Lee)*

We study the long-time behaviour of mean-field (overdamped and underdamped) Langevin dynamics and show that the convergence of the particle approximation is uniform in time. We work on a convexity hypothesis of the energy functional, without supposing a small mean-field interaction.

David LEE - Non-linear potential theory and Tug of War games

We will present some basic ideas of how to define the fractional power of a nonlinear operator.

Groupe de travail des thésards du LPSM

Monday March 6, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Pierre Marion + Lucas Ducrot** (LPSM) *Neural network training with stochastic gradient descent in a two-timescale regime (P. Marion) & Estimation of survival in age-dependent genetic disease with sporadic cases from pedigree data (L. Ducrot)*

Risk minimization for neural networks is usually performed using a stochastic gradient descent algorithm. This algorithm is difficult to analyze due to the non-convexity of the optimization problem. In this work, we study the training dynamics of shallow neural networks trained by stochastic gradient descent in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those of the outer layer. We show global convergence and obtain as a consequence bounds on the theoretical risk in a simple univariate setting.

Lucas DUCROT - Estimation of survival in age-dependent genetic disease with sporadic cases from pedigree data.

In the context of genetic disease with low allele frequency in the general population and high penetrance (i.e. Mendelian disease), family-based approach is convenient as patients are often referred to geneticists due to their strongly affected pedigree. Moreover the estimation of survival in age-dependent genetic disease has direct applications in the medical protocol of patient care. The main issue in these estimations is that genotypes are mostly unknown and must be treated as a latent variable. In the specific case where the disease does not present sporadic cases (i.e. only the carrier of a mutation can be affected by the disease), the problem is easier as an affected individual is therefore a mutation carrier, the genotype uncertainty leans on the unaffected population (carrier or non-carrier). In this context, methods already exist based on Expectation-Maximization and Elston-Stewart algorithms. However, some diseases (like breast cancer) affect both people with and without known deleterious mutations (like BRCA1/2) at different rates. The proposed method aims to take into account sporadic cases to generalize previous estimation methods of genetic disease survival. To do so, the method relies on two hypothesis: the hazard rate of general population is piecewise constant and known, the hazard ratio between carriers and non-carriers is also piecewise constant. The model is a survival mixture parameterized by the hazard ratio and the proportion of carriers. At fixed parameters, the hazard rates of carriers and non-carriers can be computed under the constrained hazard rate of general population through a fixed point method. With the pedigree data, maximum likelihood estimation is then performed to determine values of the parameters.

Groupe de travail des thésards du LPSM

Monday February 20, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Roberta Flenghi + Margot Zaffran** *Central limit theorem over non-linear functionals of empirical measures: beyond the i.i.d. setting (R. Flenghi) + Conformal prediction with missing values (M. Zaffran)*

The central limit theorem is, with the strong law of large numbers, one of the two fundamental limit theorems in probability theory. Benjamin Jourdain and Alvin Tse have extended to non-linear functionals of the empirical measure of independent and identically distributed random vectors the central limit theorem which is well known for linear functionals. The main tool permitting this ex- tension is the linear functional derivative, one of the notions of derivation on the Wasserstein space of probability measures that have recently been developed. The purpose of this work is to relax first the equal distribution assumption made by Jourdain and Tse and then the independence property to be able to deal with the successive values of an ergodic Markov chain.

Margot ZAFFRAN - Conformal prediction with missing values

Predicting a value for a new patient (e.g., predicting its level of blood platelets) with a single value (e.g., it will be 250 000/mm3) does not give any insight on the underlying uncertainty in the model (e.g., how confident are you that it will be 250 000/mm3 Would you rather say that there is a 90% chance that it will be between [200 000/mm3, 300 000/mm3] or between [100 000/mm3, 400 000/mm3]?). Conformal prediction is a general and simple procedure for quantifying uncertainty with a finite number of data without assumptions, except that the data are exchangeable (or i.i.d.).

We study conformal prediction with missing values in the covariates (e.g. maybe we don’t know the patient heart rate but our predictive model needs this information to predict) – a setting that brings new challenges to uncertainty quantification. We first show that the marginal coverage guarantee of conformal prediction holds on imputed data for any missingness distribution and almost all imputation functions. However, we emphasize that the average coverage varies depending on the pattern of missing values: conformal methods tend to construct prediction intervals that under-cover the response conditionally to some missing patterns. This motivates our novel generalized conformalized quantile regression framework, missing data augmentation, which yields prediction intervals that are valid conditionally to the patterns of missing values, despite their exponential number. Using synthetic and data from critical care, we corroborate our theory and report improved performance of our methods.

Groupe de travail des thésards du LPSM

Monday February 6, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Robin Khanfir + Alexis Ayme** (LPSM) *Convergence of looptrees coded by excursions (R. Khanfir) + Naive imputation implicitly regularizes high-dimensional linear models (A. Ayme)*

A looptree is a metric space composed of tangent cycles, glued together according to a tree structure. Since their introduction by Curien & Kortchemski in 2014, these objects found multiple uses notably in planar mapping. In this talk, we will see how one can construct such looptree using a real-valued function, and define a suitable setting to study their convergence. To do so, we define tree-looptree hybrid spaces which we call vernation trees. Finally, we will see how we can derive an invariance principle for discrete looptrees.

Alexis AYME - Naive imputation implicitly regularizes high-dimensional linear models.

Two different approaches exist to handle missing values for prediction: either imputation, prior to fitting any predictive algorithms, or dedicated methods able to natively incorporate missing values. While imputation is widely (and easily) use, it is unfortunately biased when low-capacity predictors (such as linear models) are applied afterward. However, in practice, naive imputation exhibits good predictive performance. In this talk, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. We prove that zero imputation performs an implicit regularization closely related to the ridge method, often used in high-dimensional problems. Leveraging on this connection, we establish that the imputation bias is controlled by a ridge bias, which vanishes in high dimension. As a predictor, we argue in favor of the averaged SGD strategy, applied to zero-imputed data. We establish an upper bound on its generalization error, highlighting that imputation is benign in the d ≫ √n regime. Experiments illustrate our findings.

Groupe de travail des thésards du LPSM

Monday January 16, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Emilien Bodiot + Ariane Marandon-Carlhian** (LPSM + LPSM) *Discrete Gaussian Markov Processes: Fourier VS Invariant Measure (E. Bodiot) + Machine learning meets false discovery rate (A. Marandon-Carlhian)*

A discrete Gaussian Markov Process is a simple statistical physics model. As for most statistical physics problems, people are usually interested in computing partition function, free energy, correlation function… In the context of discrete Gaussian Markov processes, there are two main methods to perform such computations. The first one, and the most efficient, is to impose periodicity and use Fourier transform. The second one is to take advantage of the Markov property of the model and look for invariant measures. Both methods allow to solve the same model but one don’t easily see any link between them. Fourier transform is a global computation which totally hides local Markov properties whereas invariant measure is a local computation based on the Markov property of the model. After discussing some details about the two approaches, we will focus ourselves on a function which links Fourier with invariant measures in this context.

Ariane MARANDON-CARLHIAN - Machine learning meets false discovery rate.

Novelty/outlier detection is the problem of identifying observations that do not conform to a well defined notion of normal behavior. More formally, the problem we consider is as follows: we have at hand a sample of observations Y_1, …, Y_n each with a common unknown distribution P_0, which we call nominal observations, and a sample of observations X_1, …, X_m that may contain both nominals, i.e. observations marginally distributed according to P0, and novelties if otherwise. The aim being to identify the novelties. Specifically, the aim is to control the false discovery rate (FDR) defined as the proportion of detections that are false (true nominals declared as novelties) at some fixed error margin, say 10% for instance, while maximizing the number of detections under this constraint.

Under this setting, Conformal Anomaly Detection (Bates et al., 2022, AOS) is a breakthrough novelty detection technique that provides the guarantee to have finite-sample control of the FDR (for whatever error margin fixed beforehand). In this talk, I will mainly present CAD. Then, I will speak a bit about our contribution: we improve upon CAD by learning from the data at hand; we show in our paper that the FDR control is retained and prove power results.

Groupe de travail des thésards du LPSM

Monday January 9, 2023, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Elisa Ndiaye + Linus Bleistein** (CIRED/CMAP + INRIA) *Quantification of the impact of climate risks on credit risk (E. Ndiaye) + Learning the dynamics of sparsely observed interacting systems for real-time prediction (L. Bleistein)*

Since the industrial revolution, the human-made greenhouse gases (GHG) emissions have kept increasing. While the greenhouse effect gets stronger, climate change becomes more and more real, with a surge in physical risk (floods, wildfires, extreme temperatures), threatening human lives and economies. Due to the higher frequency and severity of physical damages, the regulators ought to step in, taking up measures aimed at significantly reducing GHG emissions. Whereas the targets are disclosed, the number of possible pathways is endless, yet, they will all require a profound structural change of the economy, also known as energy transition, which poses risks for the financial world through different channels. These risks, commonly referred to as transition risks, will have an impact on traditional financial risks, in particular credit risk. The addition of this climate dimension creates new modelling challenges in the assessment of forward-looking credit risk. This demands a micro-economic simulation of counterparties' financials, conditional to a macro-economic climate scenario, but also a downscaling of the scenario in order to meet the granularity used, which will generate uncertainty that will also be quantified.

Linus BLEISTEIN - Learning the dynamics of sparsely observed interacting systems for real-time prediction

We address the problem of learning the dynamics of an unknown non-parametric system linking a target and a feature time series. The feature time series is measured on a sparse and irregular grid, while we have access to only a few points of the target time series. Once learned, we can use these dynamics to perform real-time prediction of the target from the previous values of the feature time series. We frame this task as learning the solution map of a controlled differential equation (CDE). By leveraging the rich theory of signatures, we are able to cast this non-linear problem as a high-dimensional linear regression. We provide an oracle bound on the prediction error which exhibits explicit dependencies on the individual-specific sampling schemes. Our theoretical results are illustrated by simulations which show that our method outperforms existing algorithms for recovering the full time series while being computationally cheap. We conclude by demonstrating its potential on real-world healthcare data.

#### Year 2022

Groupe de travail des thésards du LPSM

Monday December 12, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Nicolas Bouchot + Ludovic Arnould** (LPSM + LPSM) *Polymer in a random poor solvent: edges localization (N. Bouchot) + The Future of Neural Network training (L. Arnould)*

I will present a one dimensional model for a polymer in a poor solvent: the random walk (RW) in dimension 1 penalized by its range in a random environment. We consider a random field $\omega$ consisting of i.i.d. variables $\omega_z$ and we give any path of the RW of length $n$ a weight $\exp(-H_n)$, where $H_n$ is the sum of $h - \beta \omega_z$ over all the sites $z$ that were visited by this RW path. This new law for the path of the random walk is called the polymer measure, and is a random measure over the paths of length $n$. The parameters $h, \beta$ are supposed to be positive, meaning that under the polymer measure, the RW tends to fold itself on an optimal segment given by the random field. I will explain how we can get a rather precise result that states the following: the edges of typical paths under the polymer measure are given by $\omega$ through variational problems involving Brownian-related processes.

Ludovic ARNOULD - The Future of Neural Network training.

Neural networks are among the most popular machine learning models currently used and studied. A typical learning procedure involves the minimisation of an empirical risk via Stochastic Gradient Descent in a supervised setting. In other words, given labeled data (X_i, Y_i) drawn from an unknown distribution P_(X,Y), we fit the network to the data by minimizing a loss on the data (for instance ||X_i - Y_i ||^2) which is done by updating the weights of the network w.r.t. their gradients. The true goal is to make good predictions on unseen data, i.e. to minimize the inaccessible risk E[||X-Y||^2] (the empirical risk being an accessible proxy for this quantity).

Even without much refinement, it is possible to obtain in this way a trained NN that perfectly fits the training data (0 training error) while maintaining a good generalization score: the NN still performs well on new unseen data (X',Y'). However, the mechanisms underpinning this good empirical behavior remain poorly understood theoretically. As a consequence, it remains difficult in practice to design NN architectures and/or training procedures that specifically enhance the generalization power. The PAC-Bayes framework provides powerful tool to analyze the “generalization gap”, the discrepancy between the theoretical risk and its empirical counterpart. In this talk, we will thus study a practical training scheme inspired from the PAC-Bayes theory to improve the generalization capacities of any NN in a supervised setting.

Groupe de travail des thésards du LPSM

Monday December 5, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Lorenzo Croissant + Grégoire Szymanski** (CEREMADE + Polytechnique) *Diffusion limit control of high-frequency pure-jump processes (L. Croissant) + Rough volatility and optimal estimation of the Hurst parameter (G. Szymanski)*

Pure jump processes with a large number of jumps per unit of time are frequently encountered in modern industrial systems (e.g. financial markets, online advertising auctions, server scheduling…). These applications require sequential decision making, which is mathematically formalized as a control problem. Unfortunately, the non-local nature of jumps leads the control problem to analytical difficulties and computations infeasible in practice. Leveraging the diffusion limit regime, which avoids these difficulties, we present in this talk a characterization of the convergence to this limit, and some interesting perspectives that it opens.

Grégoire SZYMANSKI - Rough volatility and optimal estimation of the Hurst parameter

The goal of this talk is to present the estimation of the Hurst parameter in rough stochastic volatility model. First, we will recall what rough volatility is and why it is used in finance. In these models, we observe a stochastic diffusion driven by a Brownian motion. The volatility is a hidden Fractional Brownian motion. Moreover, the diffusion is only observed at discrete time. We will explain how this problem relates to the somewhat easier observations of the underlying fractional Brownian motion at discrete time polluted by an additive noise, in the same spirit as [3]. Considering the observations $\eta W^H_{i/n} + \eps^n_i$, we build an estimator i/n i based on quadratic variations with convergence rate $n^{1/(4H+2)} as in [3]. We also prove that this rate is minimax, using an adequate wavelet-based construction of the fractional Brownian motion in the second case, see [4]. Finally, we will discuss how this estimator can be generalised to the multiplicative noise model and to a non-parametric setting as in [1].

References: [1] Chong, C.; Hoffmann, M.; Liu, Y.; Rosenbaum, M.; Szymanski, G. (2022) - Statistical inference for rough volatility: central limit theorems - arXiv:2210.01216 [2] Chong, C.; Hoffmann, M.; Liu, Y.; Rosenbaum, M.; Szymanski, G. (2022) - Statistical inference for rough volatility: Minimax theory - arXiv:2210.01214 [3] Fukasawa, M.; Takabatake, T.; Westphal, R. (2019) - Is Volatility Rough ? - arXiv:1905.04852 [4] Szymanski, G. (2022) - Optimal estimation of the rough Hurst parameter in additive noise - arXiv:2205.13035

Groupe de travail des thésards du LPSM

Monday November 21, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Antonio Ocello + Yazid Janati El Idrissi** (LPSM + LPSM & Télécom SudParis) *Relaxed formulation for the control of branching diffusions: Existence of an optimal control (A. Ocello) + Iterative schemes for divergence minimization (Y. Janati El Idrissi)*

We study the existence of optimal control for branching diffusion processes. The considered problem use rewards that can be nonlinear in the final payoff and linear in the running payout. We give a relaxed formulation, showing its equivalence with the strong problem and proving the existence of optimal controls. Using the dynamic programming principle, we prove that the maximizer can be found in the class of Markovian controls.

Yazid JANATI EL IDRISSI - Iterative schemes for divergence minimization

Traditional Variational Inference proceeds by minimizing the exclusive Kullback-Leibler divergence between the variational approximation and the target distribution. It is now widely admitted that the practical implementation of this procedure yields variational approximations that have lighter tails and fewer modes than the target distribution. In this talk I will review the recent progress made in this subfield by focusing on the methods that aim at optimizing other divergences like the inclusive KL and (Rényi) alpha-divergence. Then, I will introduce a novel iterative scheme that decreases the inclusive KL geometrically fast. Its practical implementation will be discussed and supported by numerical experiments involving the computation of normalizing constants.

Groupe de travail des thésards du LPSM

Monday November 7, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Tristan Pham-Mariotti + Miguel Martinez Herrera** (LPSM) *Introduction to the Probabilistic Method (T. Pham-Mariotti) + Inference and tests for Multivariate Hawkes processes with Inhibition, application to neuroscience (M. Martinez Herrera)*

To show that there exists an element from a set that satisfy certain properties, just pick an element at random in the set and prove that with positive probability it meets the requirements. This simple idea, called the Probabilistic Method, turns out to be an extremely powerful tool, as noticed by Paul Erdös in the 50s. I hope to illustrate through examples in different fields (combinatorics, geometry, etc) the beauty of this idea to convince you to put it in your mathematical toolbox.

Miguel MARTINEZ HERRERA - Inference and tests for Multivariate Hawkes processes with Inhibition, application to neuroscience.

The Hawkes process is a past-dependant point process used to model the relationship of event occurrences between different phenomena. Since its appearance, it has been widely used in various fields such as finance, criminology and neuroscience. The Hawkes process was originally introduced to describe excitation interactions, which means that one event increases the chances of another occurring. However there has been a growing interest in the opposite effect, known as inhibition. In this talk we propose a Maximum Likelihood estimation method for multivariate Hawkes processes with exponential kernel that can handle both exciting and inhibiting interactions. Parametric estimation methods in the literature are mostly adapted to non-negative interactions and most methods proposed for the inhibiting case are restricted to non-parametric frameworks. We show that the proposed estimator performs better for synthetic data than alternative methods. We also illustrate its application to a neuronal activation dataset.

Groupe de travail des thésards du LPSM

Monday October 24, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Yoan Tardy + Clément Mantoux** (LPSM + CMAP (Polytechnique)) *Post explosion model for the supercritical Keller-Segel particle system (Y. Tardy) + Modeling Longitudinal Data with Ruptures: Application to Parkinson's Disease (C. Mantoux)*

The Keller-Segel particle system consists in N Brownian motions in the plan interacting through a Coulombian force \theta /|x|² where x is the distance between two particle. As its deterministic version, this equation is subject to a phase transition : if the intensity of the attraction \theta is less than 2 then we expect only reflective collision between particle, and if the intensity is equal or more than 2 we expect sticky collision to occurs, which means that once the collision occurs, particles are glued together forever and we call it explosion. Although this behaviour is well understood before the sticky collision (see « Collisions of the supercritical Keller-Segel particle system » from Nicolas Fournier and Y.T), the particle system is not well defined after that time and we will propose a post-explosion model following the idea of « Stochastic particle approximation of the Keller-Segel equation and two-dimensional generalization of Bessel processes » from Nicolas Fournier and Benjamin Jourdain .

Modeling Longitudinal Data with Ruptures: Application to Parkinson's Disease

Can we model the progression of neurodegenerative diseases? In longitudinal studies, we follow the development of disease symptoms in a cohort of subjects across time. This development is often decomposed into distinct progression phases, corresponding, e.g., to a degradation or to the impact of a treatment. Understanding the pace of the disease progression and its different phases is a key step in the design of monitoring systems for neurodegenerative diseases. In this talk, we present a statistical model for longitudinal trajectories with ruptures. We show that the most appropriate number of ruptures can be selected robustly in spite of a strong noise and a large proportion of missing data. Finally, we use our model on cohorts of subjects affected with Parkinson's disease, and show how it helps understanding the variability in the observed data.

Organisateurs : Alexis Ayme, Loïc Bethencourt, Nicolas Bouchot, Pierre Marion, Miguel Martinez Herrera et Antonio Ocello

Groupe de travail des thésards du LPSM

Monday October 10, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Equipe Du Gtt** *Journée de rentrée du GTT*

Groupe de travail des thésards du LPSM

Monday July 4, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Robin Khanfir + David Lee** (LPSM) *Scaling limit of critical branching random walks on a tree (Robin KHANFIR) + The harnack inequality for the fractional laplacian (David LEE)*

A branching random walk can be seen as a population which reproduces while exploring an environment in a Markovian way. Here, we restrict ourselves to the case where the genealogy is described by a critical Galton-Watson tree conditioned on having n vertices and where the environment is a d-ary tree. Moreover, the transition are such that any child has probability 1/2 to be on the vertice just below its parent's position and probability $1/(2d)$ to be on each of the d vertices just above its parent's position. Our object of study is the subtree R_d(n) explored by the population. We show that R_d(n) admits a scaling limit in distribution when $n$ goes to infinity. The limit object is a random compact continuum tree called the Brownian cactus, which has been presented by N. Curien, J-F. Le Gall, and G. Miermont. (Joint work with Thomas Duquesne, Shen Lin, and Niccolò Torri.)

The harnack inequality for the fractional laplacian (David LEE)

In this talk, I will present the fractional laplacian and raise the question of the existence of a harnack inequality.

Organisateurs : Ludovic Arnould, Jérôme Carrand, Lucas Ducrot, Robin Khanfir, Ariane Marandon-Carlhian, Antonio Ocello

Groupe de travail des thésards du LPSM

Monday June 20, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Ariane Marandon + Armand Bernou** (LPSM) *False clustering rate control in mixture models (Ariane MARANDON) + Beyond the mean-field limit for the McKean-Vlasov particle system : Uniform in time estimates for the cumulants (Armand BERNOU)*

The clustering task consists in delivering labels to the members of a sample. For most data sets, some individuals are ambiguous and intrinsically difficult to attribute to one or another cluster. However, in practical applications, misclassifying individuals is potentially disastrous. To overcome this difficulty, the idea followed here is to classify only a part of the sample in order to obtain a small misclassification rate. This approach is well known in the supervised setting, and referred to as classification with an abstention option. The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework. The problem is formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items. New procedures are introduced and their behavior is shown to be close to the optimal one by establishing theoretical results and conducting numerical experiments.

Beyond the mean-field limit for the McKean-Vlasov particle system : Uniform in time estimates for the cumulants (Armand BERNOU)

The study of the convergence to the mean-field limits of the empirical distribution of particle systems is a well-traveled topic, with many approaches coming from both analysis and probability. One of the key questions is the one of propagation of chaos: roughly, the goal is to control an error saying “how far from independence” the system is to justify the limit equation. In this talk, in a nice, smooth setting, I will explain how to go further: if we keep the leading term of this error between the particle system and the independent one to get a more precise description of the system, we can control the new (smaller) error. The method can be iterated, in the sense that we can also keep track of its leading term and control the corresponding error and so on. Along the way, I will try to give some insights about our main tools: derivatives with respect to the measure and Glauber calculus. A very important aspect of this analysis is that our estimates for those errors are uniform in time. This is joint (and ongoing) work with Mitia Duerinckx (FNRS).

Organisateurs : Ludovic Arnould, Jérôme Carrand, Lucas Ducrot, Robin Khanfir, Ariane Marandon-Carlhian, Antonio Ocello

Groupe de travail des thésards du LPSM

Monday May 23, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Pierre Bras + Alexis Ayme** (LPSM) *Stochastic gradient descent and Langevin-simulated annealing algorithms (PIERRE BRAS) + Minimax rate of consistency for linear models with missing values (Alexis AYME)*

I will show how minimization problems arising for example in machine learning can be solved using stochastic gradient descent. For non-convex optimization problems, I will introduce variants of this algorithm in order to improve the optimization procedure. Langevin algorithms consist in adding white noise to the gradient descent, hoping to escape local (but not global) minima. In its simulated annealing version, the noise is gradually decreased to zero to make the algorithm asymptotically converge, sharing its heuristic with the original simulated annealing algorithm.

Minimax rate of consistency for linear models with missing values (Alexis AYME)

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys…). In fact, the very nature of missing values usually prevents us from running standard learning algorithms. In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. Indeed, the Bayes rule can be decomposed as a sum of predictors corresponding to each missing pattern. This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets. First, we propose a rigorous setting to analyze a least-square type estimator and establish a bound on the excess risk which increases exponentially in the dimension. Consequently, we leverage the missing data distribution to propose a new algorithm, and derive associated adaptive risk bounds that turn out to be minimax optimal.

Organisateurs : Ludovic Arnould, Jérôme Carrand, Lucas Ducrot, Robin Khanfir, Ariane Marandon-Carlhian, Antonio Ocello

Groupe de travail des thésards du LPSM

Monday May 9, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Tristan Pham-Mariotti + Nicklas Werge** (LPSM) *Boundary correlations for the Z-invariant Ising Model (Tristan PHAM-MARIOTTI) + Learning from time-dependent streaming data with stochastic algorithms (Nicklas WERGE)*

In statistical mechanics, the aim is to understand the macroscopic properties of a system knowing the microscopic forces between the components. Computations for a large number of particles are in general quite hard but it turns out that interesting properties can sometimes be expressed in terms of matrices making the computations feasible. In this talk, I will explain some of theses links and mainly focus on the one between boundary correlations in an Ising Model and a special set of matrices called the positive orthogonal Grassmannian.

Learning from time-dependent streaming data with stochastic algorithms (Nicklas WERGE)

Many machine learning problems can be written as a minimization problem, a task that stochastic optimization methods can solve. As more and more data becomes available, optimization methods need to address high-dimensional problems with low computational costs; therefore, in recent years, first-order optimization methods have become prevalent in the literature. A common feature of these procedures is slow convergence, which motivates a great interest in accelerating existing algorithms. I will give a brief insight into stochastic optimization for high-dimensional problems in a streaming framework with time-dependent data.

Groupe de travail des thésards du LPSM

Monday April 25, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Apolline Louvet + Ludovic Arnould** (LPSM) *Limit of a Wright-Fisher model with ghosts (Apolline LOUVET) + Generalisation Properties of Interpolating Estimators and Random Forests (Ludovic ARNOULD)*

I will introduce a variant of the Wright-Fisher model featuring frequent local extinction events, which result in strong fluctuations in population sizes. This model is characterized by the use of “ghost individuals” to fill empty areas, in the spirit of the contact process from interacting particle systems theory. I will show that this model can be used to study genetic diversity in populations expanding in a fragmented and disturbed environment, and find a necessary condition on extinction probabilities for an expansion to be possible. This condition is derived from a coupling with an oriented percolation process, to which our variant of the Wright-Fisher model converges in the limiting regime considered here. This work was motivated by an ongoing project with the Muséum National d'Histoire Naturelle, which focuses on understanding to what extent urban tree bases can act as ecological corridors.

Generalisation Properties of Interpolating Estimators and Random Forests (Ludovic ARNOULD)

We will have a glance at the behavior of a few machine learning estimators in the interpolating regime, i.e. when the estimator is learned by perfectly fitting the data (0 train error). In practice, this regime yields very good generalisation performances (good predictions on unseen data) despite the old statistical belief that interpolation should induce over-fitting. This regime has yet to be theoretically explored in order to precisely understand how can a good generalisation still occur in the interpolation regime and whether interpolation is a key property to explain the success of these estimators. We will finally focus on the case of Random Forest estimators.

Groupe de travail des thésards du LPSM

Monday April 11, 2022, 5:30PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Antonio Ocello + Victor Roca Lucio** (LPSM) *Stochastic Target Problem (for Branching Diffusions) (Antonio OCELLO) + An algebraic ballad (Victor ROCA LUCIO)*

“This is a story about control. My control”. And more specifically an optimal stochastic target control problem. This problem consists in finding the minimal condition for which a control allows the underlying process to reach a target set at a finite terminal time. We then apply it to branching diffusions, with a special target for each of its branches. We first state a dynamic programming principle for the value function of the stochastic target problem. We then show that the value function can be characterised as the unique viscosity solution to an Hamilton-Jacobi-Bellman variational inequality, with the use of viscosity solutions.

An algebraic ballad (Victor ROCA LUCIO)

This talk is a gentle introduction to the world of algebraic structures and their relationship to probabilities. Via simple examples, we will introduce the notion of an operad and explain how these objects can encode common algebraic structures. Finally, we will show how many Hopf algebras appearing in renormalization can be constructed using operads, after the work of van der Laan and Moerdijk.

Groupe de travail des thésards du LPSM

Monday March 21, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Kimia Nadjahi + Jean-David Jacques** (LPSM) *Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections (Kimia NADJAHI) + B-series over the set of compactly supported multi-indices, (S)PDEs, and pre-Lie algebras (Jean-David JACQUES)*

The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on the problem of image generation.

B-series over the set of compactly supported multi-indices, (S)PDEs, and pre-Lie algebras (Jean-David JACQUES)

In this talk, i will give a short overview about some recent developments in in the field of renormalization of SPDEs via regularity structures. To that aim, i will introduce the category of pre-Lie algebras and give few concrete examples.

Groupe de travail des thésards du LPSM

Monday March 7, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Pierre Le Bris + Marine Demangeot** (LPSM) *Uniform in time propagation of chaos for the generalized Dyson Brownian motion and 1D Riesz gases (Pierre LE BRIS) + Continuous simulation of storm processes (Marine DEMANGEOT)*

We consider the case of one dimensional N-particle system in mean field interaction, with a singular and repulsive interaction, and we wish to understand the limit, as N goes to infinity, of the empirical measure of the system. After an introduction containing the motivation and some “usual” methods used to tackle this sort of problem, we describe a rather short proof that only relies on the well posedness of the SDE, and that in particular requires no study of the non linear limit PDE.

Continuous simulation of storm processes (Marine DEMANGEOT)

Spatial extreme value theory helps model and predict the frequency of extreme events in a spatial context like, for instance, extreme precipitations, extreme temperatures or high concentrations of pollution in the air. In this presentation, we focus on storm processes, which constitute prototype models for spatial extremes. They are classically simulated on a finite number of points within a given domain. We propose a new algorithm that allows to perform such a task everywhere, not just anywhere, in continuous domains like hyperrectangles or balls, in arbitrary dimension. This consists in generating basic ingredients that can subsequently be used to assign a value at any and every point of the simulation field. Therefore, the resolution of a single simulation can be refined indefinitely; this is particularly appropriate to investigate the geometrical properties of storm processes. Particular attention is paid to efficiency: by introducing and exploiting the notion of domain of influence of each storm, the running time is considerably reduced. Besides, most parts of the algorithm are designed to be parallelizable.

Groupe de travail des thésards du LPSM

Monday February 21, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Matthieu Dolbeault + Camila Fernandez** (LPSM) *Random sampling for weighted least-squares approximation (Matthieu DOLBEAULT) + Time-to-Event Analysis (Camila FERNANDEZ)*

We investigate the problem of approximating a function in L^2 with a polynomial of degree N, using only evaluations at M chosen points, with M of the order of N. A first approach, based on weighted least-squares at i.i.d random points, provides a near-best approximation thanks to a matrix concentration inequality, but requires M of order N log(N). To reduce the sample while preserving the quality of approximation, we will need a recent result on sums of rank-one matrices, which answers to a conjecture from quantum physics dating back to 1959.

Time-to-Event Analysis (Camila FERNANDEZ)

Time-to-event analysis is a branch of statistics that has increased its popularity during the last decades due to its many different application fields such as predictive maintenance, customer churn prediction, population's lifetime estimation, etc. In this presentation, we review and compare the performance of well used prediction methods for time-to-event analysis. These consist of semi-parametric and parametric statistical models and machine learning approaches. The comparison is carried out on three different datasets and using two different scores (the integrated Brier score and concordance index). Moreover, we show how aggregation processes, which surprisingly have not yet been much studied in time-to-event analysis, can improve the prediction accuracy. Finally, we present simulation results to complete the comparison between the two scores while varying the number of samples and the censored data percentage to show their impact.

Groupe de travail des thésards du LPSM

Monday February 7, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Emilien Bodiot + Luca Brusa** (LPSM) *Two-dimensional Gaussian Markov Processes, an operadic approach (Emilien BODIOT) + A tempered Expectation-Maximization algorithm for discrete latent variable models (Luca BRUSA)*

When working with one-dimensional Markov Processes, right and left eigenvectors of the transfer matrix provide a good notion of invariant boundary conditions. Unfortunately, from what we can tell, such a tool does not exist for two-dimensional Markov Processes. Recent work by Damien Simon fills this gap using a higher algebraic approach based on the theory of Operads. In this talk, we will present the algebraic objects that arise when considering this new formalism and how they define invariant boundary conditions. We will discuss the case of Gaussian Markov processes.

A tempered Expectation-Maximization algorithm for discrete latent variable models (Luca BRUSA)

Despite maximum likelihood estimation of many discrete latent variable models can be performed using the Expectation Maximization (EM) algorithm, a well-known drawback of this estimation method is related to the multimodality of the log-likelihood function. The consequence is that the estimation algorithm could converge to one of the local maxima, not corresponding to the global maximum. We propose a Tempered EM (T-EM) algorithm, which is able to explore the parameter space adequately. It consists in rescaling the objective function depending on a parameter known as temperature, which controls global and local maxima prominence. By properly tuning the sequence of temperature values, the target function is gradually attracted toward the global maximum, escaping local sub-optimal solutions. We rely on an accurate Monte Carlo simulation study to compare the proposal with the standard EM algorithm, evaluating both the ability to hit the global maximum and the computational time of the proposed algorithm. We conclude that the proposal outperforms the standard EM algorithm, improving the chance to reach the global maximum in the overwhelming majority of considered cases.

Groupe de travail des thésards du LPSM

Monday January 24, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Nicolas Bouchot + Eddy Ella Mintsa** (LPSM) *Self-attracting polymer in homogeneous and random environment (Nicolas BOUCHOT) + Plug-in classification procedures for diffusion paths (Eddy ELLA MINTSA)*

A polymer can be modelled as a random walk (RW) whose trajectories are pondered by a density e^(H) where H is an Hamiltonian. Studying the partition function (expectation of e^(H) under the RW law) is a way to obtain information about typical configurations for the polymer.

In this talk, I will detail the study of an Hamiltonian that penalises trajectories whose range (set of visited sites) is large (hence self attracting), leading to a competition between the diffusivity of the RW and this “self-attraction”. I will prove scaling limits for the fluctuations around an optimal range size and for the centre of the range. Afterwards, I will consider the polymer to be in a Gaussian environment and will reward trajectories visiting sites with greater associated Gaussian variables. I will prove scaling limits in probability for the extrema of the range and fluctuations around these optimal positions, holding for almost all realisations of the environment.

Plug-in classification procedures for diffusion paths (Eddy ELLA MINTSA)

Recent advents in modern technology have generated labeled data, recorded at high frequency, that can be modelled as functional data. This work focuses on multiclass classification problem for functional data modelled by a stochastic differential equation. The drift function depends on the label of the class Y ∈ {1, …, K }, K ∈ N \ {0, 1}. An observation is a solution X = (Xt )t ∈[0,1] of the following time-homogeneous stochastic differential equation d X_t = b_Y ( X_t ) d t + σ ( X_t ) d W_t, x_0 = 0 , t ∈ [ 0 , 1 ] , with unknown drift functions (bk)k=1,…,K and unknown diffusion coefficient σ . Furthermore, we assume that the law p=(p1,…,pK) of the label Y is unknown. From a learning sample DN = X ,Yi i∈[1,N] that consists of N independent copies of the pair (X,Y), our aim is to build an implementable plug-in nonparametric classification procedure and derive upper bounds of its excess risk over Hölder spaces.

Few works have investigated the classification of functional data in the stochastic differential equation framework. In [3], it is done from a parametric point of view with a known diffusion coefficient. Here, we deal with a more challenging problem : the drift function is nonparametric, plus, the diffusion coefficient and the distribution p = (p1, …, pK ) are unknown.

Our classification procedure relies on the nonparametric estimation of functions bk, k = 1, …, K and σ^2 minimizing a least-squares contrast over a spline basis (as in [1] for the estimation of the drift function). We establish the consistency of the resulting empirical classifier as a function of N, the size of the learning sample and n ∈ N∗, the number of discrete observations for each path. We obtain rates of convergence under mild assumptions. These computations rely here on stochastic calculus, in particular fine estimate of the transition densities. Finally, the obtained empirical classifier is implemented and successfully evaluated from simulated data.

Groupe de travail des thésards du LPSM

Monday January 10, 2022, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Arthur Blanc-Renaudie + Félix Cheysson** (LPSM) *Stick breaking constructions: study of D-trees and ICRT (Arthur BLANC-RENAUDIE) + Evolution of groups at risk of death from Covid-19 using hospital data (Félix CHEYSSON)*

A tree is a trunk with large branches, small branches, even smaller branches… and (outside of winter) a few leaves. I will present two algorithms, based on this principle, that have theoretical applications to additive coalescent, Galton–Watson trees, and the configuration model. I will also explain several methods to study those algorithms, which I use to prove scaling limits, bound the heights, prove compactness, and compute some fractal dimensions.

Evolution of groups at risk of death from Covid-19 using hospital data (Félix CHEYSSON)

The Health Data Warehouse (Entrepôt de Santé, EDS) of the AP-HP (Assistance Publique - Hôpitaux de Paris) is collecting and enriching information related to the hospitalisation of patients for whom a positive diagnosis of Covid-19 has been established (via PCR analysis or a lung x-ray). In order to improve patient management, it is essential to identify the risk factors of the disease and to determine whether they evolve with the different waves of the pandemic. In this context, we focus on the estimation of death rates for the groups at higher risk of death from Covid-19, using binary classification trees built from the CART algorithm (Breiman et al., 1984).

To be able to study the temporal evolution of death rates amongst these groups and thus adapt their healthcare, we propose a hypothesis test to compare CART trees and detect changes in the death rates. We show that a bootstrap version of this test holds good empirical properties, and illustrate it with numerical experiments and an application to the first wave of the pandemic. Finally, we present some theoretical insight into the distributional properties of the test statistic for our proposed hypothesis test, using results derived from the theory of U-statistics.

#### Year 2021

Groupe de travail des thésards du LPSM

Tuesday December 14, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Thibault Randrianarisoa + Alexandra Lefebvre** (LPSM) *Optional Pólya trees: posterior rates and uncertainty quantification (Thibault Randrianarisoa) + Exact inference in probabilistic graphical models and extensions over polynomials. Application to genetics and segmentation (Alexandra Lefebvre)*

We consider statistical inference in the density estimation model using a tree-based Bayesian approach, with Optional Pólya trees as prior distribution. We derive near-optimal convergence rates for corresponding posterior distributions with respect to the supremum norm. For broad classes of Hölder-smooth densities, we show that the method automatically adapts to the unknown Hölder regularity parameter. We consider the question of uncertainty quantification by providing mathematical guarantees for credible sets from the obtained posterior distributions, leading to near-optimal uncertainty quantification for the density function, as well as related functionals such as the cumulative distribution function. The results are illustrated through a brief simulation study.

Exact inference in probabilistic graphical models and extensions over polynomials. Application to genetics and segmentation (Alexandra Lefebvre)

Probabilistic graphical models play a central role for reasoning in complex systems involving latent variables. They provide a graphical representation of the dependency structure in a joint distribution. In this talk, we start with an introduction to the sum-product algorithm which exploites the dependency structure to reduce the complexity in time for an exact inference from exponential in the number of variables to exponential in the treewidth of a triangulation of the graph. Such computations are thus rendered tractable in graphs of reasonable treewidth. We will pursue with extensions of the algorithm over polynomial quantities with two illustrations: 1) computing the derivatives of the likelihood in parametric Bayesian networks; 2) relaxing the constraint on the prior distribution of the number of segments in sequence segmentation.

Groupe de travail des thésards du LPSM

Tuesday November 30, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Iqraa Meah** (LPSM) *Online multiple testing with super-uniformity reward (Iqraa Meah)*

Valid online inference is an important problem in contemporary multiple testing research, to which various solutions have been proposed recently. It is well-known that these methods can suffer from a significant loss of power if the null p-values are conservative. This occurs frequently, for instance whenever discrete tests are performed. To reduce conservatism, we introduce the method of super-uniformity reward (SURE). This approach works by incorporating information about the individual null cumulative distribution functions (or upper bounds of them), which we assume to be available. Our approach yields several new “rewarded” procedures that theoretically control online error criteria based either on the family-wise error rate (FWER) or the marginal false discovery rate (mFDR). We prove that the rewarded procedures uniformly improve upon the non-rewarded ones, and illustrate their performance for simulated and real data.

Groupe de travail des thésards du LPSM

Monday November 22, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Antoine Heranval + Lucas Broux** (LPSM) *Application of Generalized Pareto Regression Trees to the cost prediction of floods in France (Antoine HERANVAL) + The Sewing Lemma for all gamma > 0 (Lucas BROUX)*

In this work we use Generalized Pareto Regression Trees the cost prediction of floods in France on a real dataset. The aim of this study is to improve the cost prediction of an event of floods, shortly after its occurrence, for the entire French market. Indeed, following a natural catastrophe, it can be difficult to evaluate the scale and the cost of an event. In order to do that we use GPD Regression Trees to have a special focus on the extreme events and to gain further insight on the heterogeneity of the severity of these events. Thanks to a partnership with the French Federation of Insurance (FFA), essentially with one of his dedicated technical body, the association of French insurance undertaking for natural risk knowledge and reduction (Mission Risques Naturels, MRN), we have access to a large volume of events. These events represent all the events of floods that have been acknowledged as in state of natural catastrophe in France for the past 20 years. The cost of these events comes from the claims reported by the insurance company.

The Sewing Lemma for all gamma > 0 (Lucas BROUX)

Introduced by Massimiliano Gubinelli in 2004, the Sewing Lemma is a fundamental result in the theory of Rough Paths.

In this talk, I will present this lemma in its original form, along with a recently obtained version in the regime 0 < gamma ⇐ 1. I will also briefly introduce the theory of Rough Paths and explain how this new version of the Sewing Lemma can provide an elegant solution to the so-called “extension problem” of extending any Hölder path to a Rough Path.

Groupe de travail des thésards du LPSM

Monday November 15, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Pierre Marion + Sergi Burniol Clotet** (LPSM) *Framing RNN as a kernel method : A neural ODE approach (Pierre MARION) + Horospheres and horocyclic flows in nonpositive curvature (Sergi BURNIOL CLOTET)*

We study the behavior of a class of neural networks called recurrent neural network (RNN). Building on their interpretation as a discretization of a continuous-time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input, known as the signature. This connection allows us to frame a RNN as a linear method in a suitable Hilbert space. As a consequence, we obtain theoretical guarantees of generalization and stability. Our results are illustrated on simulated datasets.

Horospheres and horocyclic flows in nonpositive curvature (Sergi BURNIOL CLOTET)

I will present geodesic flows on rank 1 nonpositively curved manifolds, which are examples of weakly hyperbolic systems of geometric origin. Then I will explain two results obtained in my thesis which generalize what was known in the strong hyperbolic case. The first is the equidistribution of horospheres under the action of the geodesic flow, and the second is the unique ergodicity of the horocyclic flow of a certain class of surfaces.

Groupe de travail des thésards du LPSM

Monday October 18, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Miguel Martinez Herrera + Loïc Bethencourt** (LPSM) *Inference and tests for Multivariate Hawkes processes with Inhibition, application to neuroscience and genomics (Miguel MARTINEZ HERRERA) + Stable limit theorems for additive functionals of one-dimensional diffusion processes (Loïc BETHENCOURT)*

Since its origin for the study of earthquakes, the Hawkes process models interactions interactions where the occurrence of an event has a direct impact on the phenomenon itself. The well-studied “excitation” increases the event rate each time we observe a point whereas the “inhibition”, which is the main interest of this subject, considers the opposite effect. The main goal is to obtain a parametric estimation method for multidimensional processes, for which it is important to consider both probabilistic and statistical approaches. Such methods would provide a better understanding of the inner workings of interacting phenomena, such as neural activity networks.

Stable limit theorems for additive functionals of one-dimensional diffusion processes (Loïc BETHENCOURT)

We consider a positive recurrent one-dimensional diffusion process with continuous coefficients and we establish stable central limit theorems for a certain type of additive functionals of this diffusion. In other words we find some explicit conditions on the additive functional so that its fluctuations behave like some alpha-stable process in large time for alpha in(0,2].

Groupe de travail des thésards du LPSM

Monday June 14, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Yoan Tardy + Francesco Bonacina** (LPSM) *Collisions of the supercritical Keller-Segel particle system (Yoan Tardy) + Influenza Decline During COVID-19 Pandemic: a Global Analysis Leveraging Classification and Regression Trees (Francesco Bonacina)*

We study a particle system naturally associated to the 2-dimensional Keller-Segel equation. It consists of N Brownian particles in the plane, interacting through a binary attraction in 1/r, where r stands for the distance between two particles. We will discuss about the two cases : the subcritical and the supercritical cases which correspond to the factor of attractivity less and greater than 2. In particular, we will see first that in the subcritical case there are only collisions between pair of particules which are not an issue to define properly the solution in the classical sense, and secondly that in the supercritical case there is an explosion due to a collision between several particles that we will study precisely.

Influenza Decline During COVID-19 Pandemic: a Global Analysis Leveraging Classification and Regression Trees (Francesco Bonacina)

The COVID-19 pandemic has caused a profound shock on the ecology of infectious diseases, in particular some studies highlighted that the circulation of influenza dramatically reduced in specific countries after the COVID-19 emergence. Also, they pointed out that the phenomenon could be associated with the non-pharmaceutical interventions (NPIs) applied by governments to control the pandemic. Here we address the problem at the global scale analyzing the FluNet influenza public repository for the periods before (2015-19) and during (2020-21) COVID-19 pandemic. Firstly, we map the space-time variation of influenza and we find that the percentage of positive tests decreased globally by 98.6%, but showing very heterogeneous patterns across countries and seasons. Then, we use Random Forests and Classification And Regression Trees to link the variation of influenza incidence with several covariates such as COVID-19 incidence, strictness of NPIs, change in human mobility, demography, season and geographical region.

Organisateurs : Emilien Bodiot, Lucas Broux, Gloria Buritica, David Lee, Thibault Randrianarisoa, Yoan Tardy

Groupe de travail des thésards du LPSM

Monday May 31, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Sébastien Farkas + Toyomou Matsuda** (LPSM) *Regression trees algorithms tailored for Generalized Pareto Distribution estimation and application to Cyber risk quantification (Sébastien Farkas) + Parabolic Anderson models with singular potentials (Toyomou Matsuda)*

With the rise of the cyber insurance market, there is a need for better quantification of the economic impact of this risk and its rapid evolution. We particularly focus on severe/extreme claims, by combining a Generalized Pareto modeling – legitimate from Extreme Value Theory – and a regression tree approach. We will introduce the methodology, discuss some hypothesis, comment simulations results and explore some perspectives.

Parabolic Anderson models with singular potentials (Toyomou Matsuda)

The aim of this talk is twofold. Firstly, we will review some important facts about the parabolic Anderson model (PAM). The PAM is a heat equation with a random potential and it exhibits interesting properties such as intermittency. We will also review Schrödinger operators with random potentials, called Anderson Hamiltonians, focusing on their connections with PAMs.

Secondly, we will discuss PAMs and Anderson Hamiltonians with singular random potentials. In particular, we will discuss my ongoing work with Willem van Zuijlen to construct Anderson Hamiltonians with such singular potentials and to study asymptotics of the corresponding PAMs.

Organisateurs : Emilien Bodiot, Lucas Broux, Gloria Buritica, David Lee, Thibault Randrianarisoa, Yoan Tardy

Groupe de travail des thésards du LPSM

Monday May 17, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Carlo Bellingeri** (TU Berlin) *From Ordinary Differential Equations to Rough Differential Equations (Carlo Bellingeri)*

Starting from deterministic concepts of ordinary differential equations, we will introduce the concepts of rough differential equation “à la Davie” and geometric rough paths. These notions allow us to formulate a stochastic differential equation without having to resort to Itô's calculus and have served as an inspiration for the study of much more sophisticated equations in recent years. In conclusion, we will discuss possible extensions of these concepts to non-commutative probabilities.

Organisateurs : Emilien Bodiot, Lucas Broux, Gloria Buritica, David Lee, Thibault Randrianarisoa, Yoan Tardy

Groupe de travail des thésards du LPSM

Monday May 3, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Henri Elad Altman + Pierre Marion** (Imperial College London / LPSM) *Scaling limits of additive functionals of mixed fractional Brownian motions (Henri Elad Altman) + “Who plays Gandalf in LOTR?” - Natural Language Processing on structured data (Pierre Marion)*

In this talk we will present new methods to analyse the long-time behaviour of additive functionals of stochastic processes. As an application, we obtain scaling limit results for additive functionals of mixed fractional Brownian motions. This is based on joint work with Khoa Lê (TU Berlin).

“Who plays Gandalf in LOTR?” - Natural Language Processing on structured data (Pierre Marion)

Most recent approaches to language tasks (translation, text generation, etc.), model text as a stream of tokens, and learn a meaningful representation of these tokens via mechanisms like attention. However, some tasks require handling structured data, such as graphs or tables. One such example is knowledge graphs, which are a common way to encode real-world facts. The largest public knowledge graph is Wikidata (90M nodes, 1.2B edges). Finding information in these graphs, for instance to answer questions phrased in natural language, is a challenging task, which benefits from having a more structured approach to language modeling (semantic parsing, context embedding). The talk will give an overview of these different notions, and present recent results on the task of question answering grounded in Wikidata.

Groupe de travail des thésards du LPSM

Monday April 19, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**William Da Silva + Aude Sportisse** (LPSM) *Hamburgers, cheeseburgers and Brownian excursions (William Da Silva) + Debiasing Averaged Stochastic Gradient Descent to handle missing values (Aude Sportisse)*

We give an elementary introduction to a bijection, due to Scott Sheffield, between loop-decorated planar maps and hamburger-cheeseburger inventory trajectories. Going through the looking glass, one may use these burger walks to study the convergence of loop-decorated random planar map models (this is the so-called peanosphere topology). In the scaling limit, the hamburger and cheeseburger walks encoding a particular model of random planar maps reveals intriguing connections between planar maps and Brownian excursions, which are reminiscent of the mating-of-trees story for space-filling explorations of Liouville quantum gravity surfaces. If time allows, I will describe a branching process arising when slicing half-planar Brownian excursions at heights.

Debiasing Averaged Stochastic Gradient Descent to handle missing values (Aude Sportisse)

Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning. However, a major caveat of large data is their incompleteness. We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion. In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of O(1/n) at the iteration n, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.

Groupe de travail des thésards du LPSM

Monday March 22, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Joseph De Vilmarest + Helena Kremp** ((LPSM + Freie Universität Berlin) *Adaptive Forecasting by Kalman Filter, Application to Electricity Consumption During Lockdown (Joseph de Vilmarest) + A weak solution concept for singular SDEs (Helena Kremp)*

As the electricity storage capacities are still negligible compared to the need, the electricity production must always balance the consumption thus forecasting the load is a crucial task. During the coronavirus crisis, the electricity consumption patterns of most countries changed abruptly and the average load dropped. It highlights the need for adaptive forecasting methods, taking into account the recent observations in an online manner. We consider here generalized additive models which have demonstrated their efficiency to predict the electricity load, and we present an adaptive version based on Kalman Filtering. We apply the proposed method on the French electricity load and we test it before, during and after the lockdown of March 2020.

A weak solution concept for singular SDEs (Helena Kremp)

Since the works by Delarue, Diel (2016) and Cannizzaro, Chouk (2018) (in the Brownian noise setting), and our previous work (in the Lévy noise case), the existence and uniqueness of solutions to the martingale problem associated to multidimensional SDEs with additive \alpha-stable Lévy noise for \alpha in (1,2] and rough Besov drift of regularity \beta > (2-2\alpha)/3 is known. Motivated by the equivalence of probabilistic weak solutions to SDEs with bounded, measurable drift and solutions to the martingale problem, we define a (non-canonical) weak solution concept for singular Lévy diffusions, proving moreover equivalence to martingale solutions in both the Young \beta > (1-\alpha)/2, as well as in the rough regime \beta>(2-2\alpha)/3. This turns out to be highly non-trivial in the rough case and forces us to define certain rough stochastic sewing integrals involved. In particular, we show that the canonical weak solution concept (introduced also by Athreya, Butkovski, Mytnik (2018) in the Young case), which is well-posed in the Young case, yields non-uniqueness of solutions in the rough case.

This is ongoing work together with Nicolas Perkowski.

Groupe de travail des thésards du LPSM

Monday March 8, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**David Lee** (LPSM) *A Functional Calculus via the Extension technique*

In the pioneering work of [Caffarelli and Silvestre,Comm. Par.Diff. Equ. 32(2007)] the following problem was raised: Which type of linear operators can be realized by the Dirichlet-to-Neumann operator for a Dirichlet problem on the halfspace? Even, though the above is an analysis problem the tools of excursion theory become very useful in this situation.

In this talk, I will present a probabilistic solution to the above problem. This work was done jointly with Daniel Hauer, from the University of Sydney.

Groupe de travail des thésards du LPSM

Monday February 22, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Aaraona Rakotoarivony + Ludovic Arnould** (LPSM) *Modigliani Miller Theorem, a stochastic control approach (Aaraona Rakotoarivony) + Analyzing the tree-layer structure of Deep Forests (Ludovic Arnould)*

In their seminal paper, Modigliani and Miller showed that in a perfect market the value of a company is independent of its capital structure or its dividend policy. In this talk, we are going to investigate how this result is modified when we introduce friction in the capital market. We adopt a stochastic control approach in which the manager of a company is able to act on the cash process through a dividend, capital, or debt emission.

Analyzing the tree-layer structure of Deep Forests (Ludovic Arnould)

Random forests on the one hand, and neural networks on the other hand, have met great success in the machine learning community for their predictive performance. Combinations of both have been proposed in the literature, notably leading to the so-called deep forests (DF). We investigate the mechanisms at work in DF and outline that DF architecture can generally be simplified into more simple and computationally efficient shallow forests networks. Despite some instability, the latter may outperform standard predictive tree-based methods. In order to precisely quantify the improvement achieved by these light network configurations over standard tree learners, we theoretically study the performance of a shallow tree network made of two layers, each one composed of a single centered tree. We provide tight theoretical lower and upper bounds on its excess risk. These theoretical results show the interest of tree-network architectures for well-structured data provided that the first layer, acting as a data encoder, is rich enough.

Groupe de travail des thésards du LPSM

Monday February 1, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Eva Lawrence + Armand Bernou** (Sorbonne Université) *Entropy maximisation problems for multidimensional functional reconstruction (Eva Lawrence) + Asymptotic Behavior of Markov Processes: a Dive into the Subgeometric Case (Armand Bernou)*

In the framework of non-parametric reconstruction problems, we are interested in the reconstruction of a multidimensional function f over a compact set U and such that f satisfies a certain amount of very general integral constraints. We propose to solve this reconstruction problem by setting it in the frame of the γ-entropy maximisation problem under constraints.

That is, we define a convex function γof Rpto R+ with some good properties and we are interested in the maximisation under constraints of the quantity I_γ(f) = - ∫_U γ(f) dP.

We explain that this problem can be linked to another one that deals with signed measures F.

Such problems have been studied in the case of a single function or a single measure reconstruction.

We propose to study the more general case of the reconstruction of a function or a measure with values in Rp.

Asymptotic Behavior of Markov Processes: a Dive into the Subgeometric Case (Armand Bernou)

In this talk, I will quickly introduce the notions required to discuss the stability structure of Markov processes on general state space. Those generalise the usual ideas of recurrence, positive recurrence and aperiodicity from the countable state space theory. I will then present some methods, based on the existence of Lyapunov functionals for the stochastic generator, which allow one to obtain the convergence towards the invariant measure of the Markov process at a subgeometric rate. This case differs from the geometric one in several manners, and if time, I will shortly discuss the main differences.

Groupe de travail des thésards du LPSM

Monday January 4, 2021, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Bo Ning** (LPSM) *Spike and slab Bayesian sparse principal component analysis (Bo Ning)*

Sparse principal component analysis (PCA) is a popular tool for dimension reduction of high-dimensional data. Recently, there have been some works on Bayesian sparse PCA. However, those studies are mostly theoretical. There is a lack of efficient computational algorithms that are available for practical use. In this talk, I will propose a new method for Bayesian sparse PCA. In addition to studying the posterior contraction rate of this method, I will introduce a PX-CAVI algorithm based on variational inference. The PX-CAVI algorithm applies the parameter expansion to the principal components to avoid dealing with the orthogonal constraints between eigenvectors directly. This algorithm is fast and accurate. Our simulation studies showed that the PX-CAVI algorithm outperforms the existing penalized methods and an EM algorithm that uses a continuous spike and slab prior. The R code of this algorithm is available online.

#### Year 2020

Groupe de travail des thésards du LPSM

Monday December 14, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Jean-David Jacques** (LPSM) *Rota-Baxter algebras, Bohnenblust-Spitzer identity and application to probability (Jean-David Jacques)*

In 1955, Frank Spitzer studied some combinatoric relations involving the maximum of some truncations of a serie of real numbers. This combinatorial study led to a usefull relation for caracteristic functions of real random walk. Later, Glen Baxter obtained the same relation by introducing an operator over the space of caracteristic functions, the so called “Baxter operator”. Finaly, a complete algebraic study by the famous combinatorist Gian-Carlo Rota led to what we call nowaday “Rota-Baxter algebra”.

Groupe de travail des thésards du LPSM

Monday December 7, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Joseph De Vilmarest** (LPSM) *Adaptive Forecasting by Kalman FIlter, Application to Electricity Consumption During Lockdown (Joseph de Vilmarest)*

As the electricity storage capacities are still negligible compared to the need, the electricity production must always balance the consumption thus forecasting the load is a crucial task. During the coronavirus crisis, the electricity consumption patterns of most countries changed abruptly and the average load dropped. It highlights the need for adaptive forecasting methods, taking into account the recent observations in an online manner. We consider here generalized additive models which have demonstrated their efficiency to predict the electricity load, and we present an adaptive version based on Kalman Filtering. We apply the proposed method on the French electricity load and we test it before, during and after the lockdown of March 2020.

Groupe de travail des thésards du LPSM

Monday November 30, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Lucas Iziquel** (LPSM) *Tree-like random metric spaces seen as fixed points of distributional equations (Lucas Iziquel)*

When studying the scaling limits of some models of random trees or graphs, we obtain random compact metric spaces. From the study of the Continuum Random Tree (CRT), a classical example of such random spaces, we will use its self-similarity property - a well-chosen subspace of the CRT has the same distribution as a rescaled copy of the entire CRT itself - to see the CRT as a fixed-point of a particular equation. From there we will introduce a framework to study this kind of distributional fixed-point equations, the existence and uniqueness of the fixed-points, and the possible convergence towards these fixed-points. Moreover, some geometric properties of the fixed-point can be deduced directly from the equation, for instance its almost sure fractal dimensions.

Image: A realization of the 1.07-stable looptree from the website of Igor Kortchemski

Groupe de travail des thésards du LPSM

Monday November 23, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Vasiliki Velona** (Universitat Pompeu Fabra and Universitat Politècnica de Catalunya / Visiting LPSM) *The broadcasting problem (Vasiliki Velona)*

Consider a large rooted tree, where the root-vertex of the tree has a random bit value assigned. Every other vertex has the same bit value as its parent with probability 1-q and the opposite value with probability q, where q\in [0,1]. The broadcasting problem consists in estimating the value of the root bit, upon observing the unlabelled tree graph and the bit values associated with a subset of the vertices. I will discuss such results on various tree classes and, in particular, on random recursive trees created either by the uniform attachment model or by the linear preferential attachment model.

Groupe de travail des thésards du LPSM

Monday November 16, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Robin Khanfir** (LPSM) *The range of branching random walks on Galton-Watson trees (Robin Khanfir)*

The scaling limit of critical Galton-Watson trees conditionned to be large is an universal random compact metric space, called the Continuum Random Tree (CRT). For super-critical Galton-Watson trees, finding such scaling limit is not possible because they are infinite, and so not compact, with positive probability. One could overcome this difficulty by randomly choosing a subtree, with large fixed size, of an infinite Galton-Watson tree. To do so, we consider a random walk on the infinite tree then we look at the set of the vertices visited by the walk before time n, which we call the range or the trace of the walk. In this case, there is indeed a scaling limit that consists of several CRT glued together at their root. Then, an interesting question arises. What does happen if we replace the random walk by a branching random walk ? In other words, we try to index our walk by a critical Galton Watson tree conditioned to be large instead of a linear time.

Groupe de travail des thésards du LPSM

Monday November 9, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Linus Bleistein** (ENS Ulm, LPSM) *Wasserstein-GANs and the Signature Transform (Linus Bleistein)*

Wasserstein-GANs (W-GANs), a class of generative adversarial models based on neural networks and the Wasserstein distance, have recently drawn a lot of attention because of their high generative power for images. However, little has been done regarding time series generation. We introduce the signature transform, a non-linear transformation that allows for simple representations of complex, high dimensional paths, and combine them with W-GANs in order to generate time series.

Groupe de travail des thésards du LPSM

Monday October 26, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Nicklas Werge** (LPSM) *AdaVol: An Adaptive Recursive Volatility Prediction Method (Nicklas Werge)*

Quasi-Maximum Likelihood (QML) procedures are theoretically appealing and widely used for statistical inference. While there are extensive references on QML estimation in batch settings, the QML estimation in streaming settings has attracted little attention until recently. An investigation of the convergence properties of the QML procedure in a general conditionally heteroscedastic time series model is conducted, and the classical batch optimization routines extended to the framework of streaming and large-scale problems. An adaptive recursive estimation routine for GARCH models named AdaVol is presented. The AdaVol procedure relies on stochastic approximations combined with the technique of Variance Targeting Estimation (VTE). This recursive method has computationally efficient properties, while VTE alleviates some convergence difficulties encountered by the usual QML estimation due to a lack of convexity. Empirical results demonstrate a favorable trade-off between AdaVol’s stability and the ability to adapt to time-varying estimates for real-life data.

Groupe de travail des thésards du LPSM

Monday October 19, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Isao Sauzedde** (LPSM) *Covariant Young's integration (Isao Sauzedde)*

We will present a new point of view on the Young's integral which uses no approximation of the path. It will give us a slight extension of it, with the additional property of being stable by smooth deformations of the plane. We will then go from Young to stochastic integration, and finally define the integral of some irregular random 1-forms along Brownian motion. The main role will be played by the winding of curves around points.

Groupe de travail des thésards du LPSM

Monday July 6, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Gloria Buritica** (LPSM) *Clustering of extreme events (Gloria Buritica)*

The occurrence of an extreme event usually triggers a sequence of extreme events in a short period. In practice, this phenomenon enchains very negative consequences when the risk model does not account for the probability of time-clustering of extremes. For example, many floods occur after recording extreme rainfall data for consecutive days. Similarly, returns for stock prices crash for several days before returning to a usual dynamic. In the setting of regularly varying stationary time series, the extremal index is a parameter that fully describes in most cases the time-clustering of extremes for univariate time series. The multivariate setting inspires from these results and considers time-clusters as data from a portion of time with the supremum norm exceeding a high threshold. We study a generalisation of this notion considering time-blocks of data with norm Lp above a high threshold. This new definition allows for capturing extreme behavior from more time-blocks than before.

Organisateurs : F. Bechtold, W. Da Silva , A. Fermanian, S. Has, Y. Yu

Groupe de travail des thésards du LPSM

Wednesday January 1, 2020, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Anciens Orateurs** (LPSM) *Anciens GTT année 2020*

KMT coupling for random walk bridges - Xuan Wu (15 Juin 2020)

Smoothing of Bayesian forest estimators in density estimation - Thibault Randrianarisoa (8 Juin 2020)

An averaging (path-by-path) approach to regularisation by noise for ODEs - Lucio Galeati (1 Juin 2020)

High Regularity Invariant Measures in PDEs - Mickaël Latocca (25 Mai 2020)

Wasserstein Random Forests and Applications in Heterogeneous Treatment Effects - Qiming Du (18 Mai 2020)

An approach to analyze the tail of the distribution of heterogenous data and application to insurance - Sébastien Farkas (11 Mai 2020)

Pathwise Regularisation of McKean–Vlasov Equations - Avi Mayorcas (4 Mai 2020)

On a class of completely random measures (CRMs) and its role in Bayesian analysis - Riccardo Passeggeri (27 Avril 2020)

An introduction to statistical properties of expanding maps - Malo Jézéquel (20 Avril 2020)

Introduction on post hoc inference (online) - Marie Perrot-Dockès (6 Avril 2020)

Learning with signatures (online) - Adeline Fermanian (30 Mars 2020)

Introduction on post hoc inference (cancelled!) - Marie Perrot-Dockès (9 Mars 2020)

Towards a better understanding of Wasserstein GANs - Ugo Tanielian (24 Février 2020)

Quantum Computing and Applications to Machine Learning - Jonas Landman (10 Février 2020)

Polymer Pinning Model - Order of the phase transition in the inhomogeneous model - Alexandre Legrand (3 Février 2020)

A kernel-based consensual regression aggregation method - Sothea HAS (3 Février 2020)

A Bayesian hierarchical model for traffic flow and waiting time prediction in carpooling lines - Panayotis Papoutsis (27 Janvier 2020)

On weighted sampling without replacement - Othmane Safsafi (20 Janvier 2020)

Coupling methods for the convergence rate of Markov processes - Armand Bernou (13 Janvier 2020)

Change-point analysis of copula models - Karen A. Vásquez Vivas (6 Janvier 2020)

Organisateurs : F. Bechtold, W. Da Silva , A. Fermanian, S. Has, Y. Yu

#### Year 2019

Groupe de travail des thésards du LPSM

Tuesday January 1, 2019, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Anciens Orateurs** (LPSM) *Anciens GTT année 2019*

Growth-fragmentation in Planar Brownian excursions - William Da Silva (9 Décembre 2019)

The Calderón Problem through the eyes of a probabilist - David Lee (2 Décembre 2019)

Statistical inference for a partially observed interacting system of Hawkes processes - Chenguang Liu (25 Novembre 2019)

Long time dynamics for interacting oscillators on graphs - Fabio Coppini (18 Novembre 2019)

Heat kernel on the infinite percolation cluster - Chenlin GU (4 Novembre 2019)

Modified Runge-Kutta methods for pathwise approximations of SDEs - F. Bechtold (28 Octobre 2019)

Informative missing data - Aude Sportisse (21 Octobre 2019)

Random lifts and cutoff phenomenon - Guillaume Conchon-Kerjan (14 Octobre 2019)

Les EDPS semi-linéaires à diffusion non-bornée - Florian Bechtold (24 Juin 2019)

Belief propagation in Bayesian networks and Markov chains and extensions with polynomials - Alexandra Lefebvre (17 Juin 2019)

Algèbres de quasi-battage, mouvement Brownien et chemins rugueux - Carlo Bellingeri (3 Juin 2019)

Chaînes d'oscillateurs et modèles à champ moyen - Alejandro Fernandez Montero (27 Mai 2019)

Construction séquentielle d'arbre couvrant minimal - Othmane SAFSAFI (20 Mai 2019)

A mathematical model on black market - Chenlin GU (13 Mai 2019)

Rough paths, signature and statistical learning - Adeline Fermanian (6 Mai 2019)

Annulation de la constante isopérimétrique ancrée de percolation en p_c - Barbara Dembin (15 Avril 2019)

On the diffusion of eigenvectors of random matrices - Lucas Benigni (8 Avril 2019)

Impact of tree choice in metagenomics differential abundance studies - Antoine Bichat (25 Mars 2019)

Quelques propriétés sur les ICRT - Arthur Blanc-Renaudie (18 Mars 2019)

Concentration inequalities for functions of independent random variables - Antoine Marchina (11 Mars 2019)

K-means algorithm with Bregman divergences and constructing predictive models based on this algorithm - Sothea Has (4 Mars 2019)

Penalized likelihood methods applied to age-period-cohort analysis - Vivien Goepp (25 Février 2019)

Optimal control and dynamic programming - Enzo Miller (18 Février 2019)

Universality for critical kinetically constrained models - Ivailo Hartarsky (11 Février 2019)

An introduction to Extreme Value Theory - Nicolas Meyer (4 Février 2019)

Windings of Brownian motion - Isao Sauzedde (27 Janvier 2019)

Modèle de Poland-Scheraga pour la dénaturation de l'ADN - Alexandre Legrand (21 Janvier 2019)

Spectral techniques in matrix completion - Simon Coste (14 Janvier 2019)

Introduction of high-dimensional interpretable machine learning models and their applications - Simon Bussy (7 Janvier 2019)

Organisateurs : A. Lefebvre, N. Meyer, O. Safsafi, T. Touati

#### Year 2018

Groupe de travail des thésards du LPSM

Monday January 1, 2018, 5PM, Jussieu salle Paul Lévy (209) couloir 16-26 (2ème étage)

**Anciens Orateurs** (LPSM) *Anciens GTT année 2018*

The Hitchhiker’s Guide to the Galaxy of Financial Risk Management: Risk Measures and Procyclicality - Marcel Bräutigam (3 Décembre 2018)

Scaling limits of a random graph: critical configuration model with power-law degrees - Guillaume Conchon-Kerjan (26 Novembre 2018)

Temps d’infection dans le modèle de Duarte : le rôle des barrières d’énergie - Laure Marêché (19 Novembre 2018)

Numerical consistent estimates in the multivariate linear mixed-effects model and application to the malaria infection study - Eric Adjakossa (12 Novembre 2018)

How to free the boundary - Clément Cosco (5 Novembre 2018)

A brief introduction to Sequential Monte Carlo - Qiming Du (22 Octobre 2018)

Records of the Fractional Brownian Motion - Assaf Shapira (15 Octobre 2018)

Local convergence for random permutations: the case of uniform pattern-avoiding permutations - Jacopo Borga (27 Juin 2018)

Triangle, Étoile, Intégrabilité - Paul Melotti (11 Juin 2018)

Introduction à la théorie des jeux à champs moyens (ou Mean Field Games - MFG) - Ziad Kobeissi (4 Juin 2018)

Random obstacle problems, and integration by parts formulae for the laws of Bessel bridges - Henri Elad-Altman (28 Mai 2018)

Bayesian inference of causality in gene regulatory networks - Flaminia Zane (23 Mai 2018)

Régularité de l'exposant de Liapounov d'un produit de matrices aléatoires - Benjamin Havret (23 Avril 2018)

Loi asymptotique de l'estimateur des moindres carrés pour les modèles linéaires avec erreurs dépendantes - Emmanuel Caron (9 Avril 2018)

Un graphe aléatoire pour modéliser la spéciation - François Bienvenu (26 Mars 2018)

Théorèmes limites pour des fonctionnelles de clusters d'extrêmes de processus et champs aléatoires faiblement dépendants - José-Gregorio Gomez (20 Mars 2018)

Contrôle Stochastique et Apprentissage Statistique : exemple de la gestion middle-out d'un portefeuille - Alexis Bismuth (19 Mars 2018)

Arbres, marches et laminations aléatoires - Paul Thévenin (12 Mars 2018)

Cutoff of sparse Markov chains - Guillaume Conchon-Kerjan (5 Mars 2018)

Formule d'Itô avec les structures de régularité - Carlo Bellingeri (26 Février 2018)

Théorie de l’estimation par partition dépendante des données et Rules Induction Partitioning Estimator - Vincent Margot (20 Février 2018)

Cardinal minimal d'une surface de coupure dans une percolation de premier passage surcritique - Barbara Dembin (19 Février 2018)

Peignes et coalescents échangeables - Félix Foutel-Rodier (12 Février 2018)

Organisateurs : C. Cosco, S. Coste, L. Marêché, P. Melotti, N. Meyer, B. Dembin, G. Conchon-Kerjan, F. Coppini, O. Safsafi