Thematic team Statistics, data, algorithms

## Statistics seminar

#### Day, hour and place

Tuesday at 09:30, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

#### Contact(s)

### Next talks

Statistics seminar

Tuesday April 30, 2024, 9:30AM, Jussieu en salle 15-16.201

**Spencer Frei** (UC Davis) *To be announced.*

Statistics seminar

Tuesday May 14, 2024, 9:30AM, Jussieu en salle 15-16.201

**Rafaël Pinot** (LPSM Sorbonne Université) *To be announced.*

To add the talks calendar to your agenda, subscribe to this calendar by using this link.

### Previous talks

#### Year 2024

Statistics seminar

Tuesday April 16, 2024, 9:30AM, Jussieu en salle 15-16.201

**Borjan Geshkovski** (Inria) *Une perspective mathématique sur les Transformers*

Statistics seminar

Tuesday April 2, 2024, 9:30AM, Jussieu en salle 15-16.201

**Anne-Claire Haury** (Google/LPSM Sorbonne Université) *The Vehicle Routing Problem*

*** The Vehicle Routing Problem : definition, solutions and (sometimes statistical) challenges In this (casual!) talk, I'll introduce vehicle routing problems (VRP), how we model and solve them, and some of the challenges we encounter. The VRP is an operations research problem initially but in practice it is often stochastic. I'll try to highlight some statistical problems arising. A completely informal (and probably wrong) definition of vehicle routing problems (VRP): any problem that contains vehicles (they don't have to have wheels, they can walk or swim, anything works as long as they're moving) that need to perform tasks (any kind works, from pickup/delivery to services) in predefined places. These problems are very often constrained in different ways (time, capacity, precedences, security, legally…) which of course makes the problems a bit more fun to solve. A bunch of problems can be modeled using this framework, including - but not limited to: parcel delivery, food/immediate delivery, pickups, B-to-B pickup and deliveries, ride sharing, field service management… Some of these, when they are simple and small, may be solved exactly but almost any real-world problem has either a set of features or a size that makes exact solutions impossible. In most cases we have to turn to heuristics, albeit clever and fast. Yet there are many areas of research. I'll tell you more about them ! About me: I'm a new PAST at LPSM. Initially a statistician (PhD in 2012 with JP Vert on feature selection), I changed course 7 years ago to become a software engineer in the Operations Research (OR) team at Google in Paris. I've worked at Google for 10 years, and in the OR team for 7. I'll tell you how I got here in the lab and how I ended up at Google in the first place. You may have heard of OR-tools, an open-sourced lib for OR problems, often used as a benchmark in research papers.**Problèmes de tournées de véhicules : définition, solutions et défis (parfois statistiques)

Je vous ferai une introduction informelle des problèmes d'optimisation de tournées de véhicules (vehicle routing problems - VRP), leur modélisation et les manières de les résoudre, ainsi que des pistes de recherche pour répondre à de nouveaux défis apparaissant. Initialement le VRP est un problème de recherche opérationnelle mais en pratique il est très souvent stochastique, je tenterai de présenter quelques sujets sous un angle statistique.

Une définition sans doute pas du tout réglementaire du VRP : il s'agit de l'ensemble des problèmes qui comportent des véhicules (sur roues, sur pieds, à ski… peu importe du moment que ça se déplace) et des tâches à effectuer à des endroits prédéfinis (livraisons, enlèvements, interventions…). Ces problèmes peuvent être accompagnés d'un grand nombre de contraintes - de temps, de capacités, de précédences, de sécurité ou encore de contraintes légales, rendant le problème parfois plus simple, souvent plus dur mais en tout cas plus intéressant à résoudre.

On peut donc modéliser un grand nombre de problèmes de cette façon, dont les plus courants sont la livraison de colis à domicile, la livraison immédiate (livraison de repas par exemple), les enlèvements (poubelles, encombrants), les déplacements de matériels ou de biens entre entreprises, le partage de transports, la gestion des services de terrain (interventions techniques chez des particuliers)…

Certains de ces problèmes, dans leurs versions les plus simples et plus petites peuvent être résolus de manière exacte mais les applications réelles n'entrent pas dans cette catégorie et sont souvent résolues avec des heuristiques dont je vous parlerai. C'est également un problème qui pose perpétuellement de nouvelles questions, je vous en dirai plus sur les sujets non encore résolus.

A propos de moi: Nouvellement recrutée PAST au LPSM, je suis statisticienne de formation (thèse en 2012 avec JP Vert sur la sélection de variables) mais j'ai bifurqué et je travaille depuis 10 ans chez Google à Paris, dont 7 ans dans l'équipe de recherche opérationnelle (RO) en tant qu'ingénieure logiciel. Je vous raconterai comment je suis arrivée ici (et là-bas) et comment on résout des problèmes d'optimisation sous contraintes dans notre équipe de RO (souvent connue de celles et ceux qui utilisent notre librairie open-source OR-tools).

Statistics seminar

Tuesday March 12, 2024, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Guillem Rigaill** (INRAE) *Online multivariate changepoint detection: leveraging links with computational geometry*

The increasing volume of data streams poses significant computational challenges for detecting changepoints online. Likelihood-based methods are effective, but their straightforward implementation becomes impractical online. We develop two online algorithms that exactly calculate the likelihood ratio test for a single changepoint in p-dimensional data streams modeled with a distribution from the natural exponential family. These algorithms leverage connections with computational geometry. Our first algorithm is straightforward and empirically quasi-linear. The second is more complex but provably quasi-linear: $O(n(log(n))^{p+1})$ for n data points in dimension p. Through simulations, we illustrate, that they are fast and allow us to process millions of points within a matter of minutes up to p = 5.

In this presentation, I will first highlight how we establish the connection between changepoint models and geometry using a functionalization and relaxation argument. Then, I will explain how we derive our algorithms and theoretically bound their expected complexity.

This is joint work with Liudmila Pishchagina, Gaetano Romano, Paul Fearnhead and Vincent Runge https://arxiv.org/abs/2311.01174

Statistics seminar

Tuesday February 27, 2024, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Eugène Ndiaye** (Apple) *From Conformal Predictions to Confidence Regions*

Statistics seminar

Tuesday January 30, 2024, 9:30AM, Jussieu en salle 15-16.201

**Ester Mariucci** (Université Versailles Saint Quentin) *Nonparametric density estimation for the small jumps of Lévy processes*

This is a joint work with Céline Duval and Taher Jalal.

Statistics seminar

Tuesday January 23, 2024, 9:30AM, Jussieu en salle 15-16.201

**Chiara Amorino** (University of Luxembourg) *Locally differentially private drift parameter estimation for iid paths of diffusion processes*

Statistics seminar

Tuesday January 16, 2024, 9:30AM, Jussieu en salle 15-16.201

**Sayantan Banerjee** (Indian Institute of Management Indore) *Precision Matrix Estimation under the Horseshoe-like Prior–Penalty Dual*

Computationally efficient EM and MCMC algorithms are developed respectively for the penalized likelihood and fully Bayesian estimation problems. In numerical experiments, the horseshoe-based approaches echo their superior theoretical properties by comprehensively outperforming the competing methods. A protein–protein interaction network estimation in B-cell lymphoma is considered to validate the proposed methodology.

#### Year 2023

Statistics seminar

Friday December 8, 2023, 9:30AM, Jussieu en salle 16-26-209

**Pierre Alquier** (ESSEC) *Robust estimation and regression with MMD*

In the second part of this talk, I will discuss the extension of this method to the estimation of conditional distributions, which allows to use MMD-estimators in various regression models. On the contrary to mean embeddings, very technical conditions are required for the existence of a conditional mean embedding that allows defining an estimator. In most papers, these conditions are often assumed, but rarely checked. It turns out that, in most generalized linear regression models, we proved that these conditions can be met, at the cost of more restrictions on the kernel choice.

This is based on joint works with: Badr-Eddine Chérief-Abdellatif (CNRS, Paris), Mathieu Gerber (University of Bristol), Daniele Durante (Bocconi University), Sirio Legramanti (University of Bergamo), Jean-David Fermanian (ENSAE Paris), Alexis Derumigny (TU Delft), Geoffrey Wolfer (RIKEN-AIP, Tokyo).

Statistics seminar

Tuesday November 21, 2023, 9:30AM, Jussieu en salle 15-16.201

**Deborah Sulem** (Barcelona School of Economics / Universitat Pompeu Fabra) *Bayesian inference for multivariate event data with dependence*

Statistics seminar

Tuesday November 7, 2023, 9:30AM, Jussieu en salle 15-16.201

**Alberto Suarez** (Universidad Autónoma de Madrid) *The arrow of time: at the intersection of thermodynamics, machine learning, and causality*

Statistics seminar

Tuesday October 10, 2023, 9:30AM, Jussieu en salle 15-16.201

**Paul Escande** *On the Concentration of the Minimizers of Empirical Risks*

Instead of deriving guarantees on the usual estimation error, we will explore concentration inequalities on the distance between the sets of minimizers of the risks. We will argue that for a broad spectrum of estimation problems, there exists a regime where optimal concentration rates can be proven. The bounds will be showcased on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.

Statistics seminar

Thursday September 28, 2023, 9:30AM, Jussieu en salle 15-25.102

**Ruth Heller** (Tel-Aviv University) *Simultaneous Directional Inference*

The relevant paper is arXiv:2301.01653 Joint work with Aldo Solari

Statistics seminar

Tuesday May 30, 2023, 9:30AM, Jussieu en salle 15-16.201

**Michael Arbel** (INRIA) *Non-Convex Bilevel Games with Critical Point Selection Maps*

Statistics seminar

Thursday May 25, 2023, 9:30AM, Jussieu en salle 15-16.201

**Jeffrey Näf** (INRIA Montpellier) *Distributional Random Forest: Heterogeneity Adjustment and Multivariate Distributional Regression*

Statistics seminar

Tuesday May 23, 2023, 9:30AM, Jussieu en salle 15-16.201

**Evguenii Chzhen** (Orsay) *Demographic parity constraint for algorithmic fairness : a statistical perspective*

This talk is based on a sequence of joint works with Ch. Denis, S. Gaucher, M. Hebiri, L. Oneto, M. Pontil, and N. Schreuder.

Statistics seminar

Tuesday May 9, 2023, 9:30AM, Jussieu en salle 15-16.201

**Charlotte Dion-Blanc** (Sorbonne Université) *Classification multi-classes, pour des trajectoires issues de processus de diffusions*

Statistics seminar

Tuesday April 11, 2023, 9:30AM, Sophie Germain en salle 1013

**Tabea Rebafka** (Sorbonne Université) *Model-based graph clustering with an application to ecological networks*

Statistics seminar

Tuesday March 28, 2023, 9:30AM, Jussieu en salle 15-16.201

**David Rossell** (Universitat Pompeu Fabra) *Statistical inference with external information: high-dimensional data integration*

Statistics seminar

Tuesday March 21, 2023, 9:30AM, Sophie Germain en salle 1013

**Cécile Durot** (Université Paris Nanterre) *To be announced.*

Statistics seminar

Thursday March 9, 2023, 9:30AM, Jussieu en salle 16-26.209

**Pierre Wolinski** (INRIA) *Gaussian Pre-Activations in Neural Networks: Myth or Reality?*

Statistics seminar

Thursday February 9, 2023, 9:30AM, Sophie Germain en salle 1016

**Vincent Divol** (CEREMADE) *Estimation d'applications de transport optimal dans des espaces fonctionnels généraux*

Statistics seminar

Tuesday January 24, 2023, 9:30AM, Jussieu en salle 15-16.201

**Laure Sansonnet** (INRAE MIA Paris-Saclay) *Sélection de variables dans des modèles linéaires (généralisés) multivariés avec dépendance*

La première partie est en collaboration avec Julien Chiquet, Céline Lévy-Leduc et Marie Perrot-Dockès et la deuxième partie est en collaboration avec Marina Gomtsyan, Céline Lévy-Leduc et Sarah Ouadah.

#### Year 2022

Statistics seminar

Tuesday December 6, 2022, 9:30AM, Jussieu en salle 15-16.201

**Vianney Perchet** (ENSAE) *An algorithmic solution to the Blotto game using multi-marginal couplings*

Statistics seminar

Tuesday November 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Morgane Austern** (Harvard University) *To split or not to split that is the question: From cross validation to debiased machine learning.*

Statistics seminar

Tuesday November 8, 2022, 9:30AM, Jussieu en salle 15-16.201

**Arshak Minasyan** (CREST-ENSAE) *All-In-One Robust Estimator of sub-Gaussian Mean*

Statistics seminar

Thursday October 20, 2022, 11AM, Jussieu en salle 15-16.201

**Misha Belkin** (University of California) *Neural networks, wide and deep, singular kernels and Bayes optimality*

Statistics seminar

Tuesday October 11, 2022, 9:30AM, Jussieu en salle 15-16.201 et retransmission

**Yifan Cui** (Zhejiang University) *Instrumental Variable Approaches To Individualized Treatment Regimes Under A Counterfactual World*

Statistics seminar

Tuesday September 27, 2022, 9:30AM, Jussieu en salle 15-16.201

**Emilie Kaufmann** (CNRS) *Exploration non paramétrique dans les modèles de bandits*

Statistics seminar

Tuesday May 31, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Elsa Cazelles** (IRIT) *A novel notion of barycenter for probability distributions based on optimal weak mass transport*

Statistics seminar

Tuesday May 10, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Guillaume Lecué** (CREST) *A geometrical viewpoint on the benign overfitting property of the minimum $\ell_2$-norm interpolant estimator.*

[1] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116(32):15849–15854, 2019.

[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64(3):107–115, 2021.

[3] Peter L. Bartlett, Philip M. Long, Gabor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117(48):30063–30070, 2020.

[4] Peter L. Bartlett, Andreas Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. To appear in Acta Numerica, 2021.

[5] Mikhail Belkin. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. To appear in Acta Numerica, 2021.

[6] Alexander Tsigler and Peter L. Bartlett. Benign overfitting in ridge regression. 2021.

Statistics seminar

Tuesday April 19, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Clément Marteau** (Université Lyon 1) *Supermix : régularisation parcimonieuse pour des modèles de mélange*

Statistics seminar

Tuesday April 5, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Fabrice Grela** (Université de Nantes) *Minimax detection and localisation of an abrupt change in a Poisson process*

Statistics seminar

Tuesday March 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Aymeric Dieuleveut** (Polytechnique) *Federated Learning and optimization: from a gentle introduction to recent results*

Refs:Mainly: Differentially Private Federated Learning on Heterogeneous Data, M Noble, A Bellet, A Dieuleveut, Aistats 2022, Link Preserved central model for faster bidirectional compression in distributed settings C Philippenko, A Dieuleveut, Neurips 2021 LinkIf time allows it (unlikely): Federated Expectation Maximization with heterogeneity mitigation and variance reduction, A Dieuleveut, G Fort, E Moulines, G Robin, Neurips 2021 Link

Statistics seminar

Tuesday March 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Lihua Lei** (Stanford University) *Testing for outliers with conformal p-values*

Statistics seminar

Tuesday February 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Élisabeth Gassiat** *Deconvolution with unknown noise distribution*

Statistics seminar

Tuesday January 25, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Nicolas Verzelen** (Université de Montpellier) *Optimal ranking in crowd-sourcing problem*

This talk is based on a joint ongoing work with Alexandra Carpentier and Emmanuel Pilliat.

#### Year 2021

Statistics seminar

Tuesday December 14, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Julie Delon** (Université de Paris) *Some perspectives on stochastic models for Bayesian image restoration*

Statistics seminar

Tuesday November 30, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Frédéric Chazal** (INRIA) *A framework to differentiate persistent homology with applications in Machine Learning and Statistics*

However, the approaches proposed in the literature are usually

anchored to a specific application and/or topological construction, and do not come with theoretical guarantees.

In this talk, we will study the differentiability of a general map associated with the most common topological construction, that is, the persistence map. Building on real analytic geometry arguments, we propose a general framework that allows to define and compute gradients for persistence-based functions in a very simple way. As an application, we also provide a simple, explicit and sufficient condition for convergence of stochastic subgradient methods for such functions. If time permits, as another application, we will also show how this framework combined with standard geometric measure theory arguments leads to results on the statistical behavior of persistence diagrams of filtrations built on top of random point clouds.

Statistics seminar

Tuesday November 23, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Yannick Baraud** (Université de Luxembourg) *Comment construire des lois a posteriori robustes à partir de tests ?*

Statistics seminar

Tuesday November 9, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

**Alessandro Rudi** (INRIA) *PSD models for Non-convex optimization and beyond*

Statistics seminar

Tuesday October 19, 2021, 9:30AM, Sophie Germain en salle 1013

**Antoine Marchina** (Université de Paris) *Concentration inequalities for suprema of unbounded empirical processes*

Statistics seminar

Tuesday October 5, 2021, 9:30AM, Jussieu en salle 15-16.201

**Judith Rousseau** (Oxford) *Semiparametric and nonparametric Bayesian inference in hidden Markov models*

Joint work with D. Moss (Oxford).