Statistics seminar

Séminaire

Thematic team Statistics, data, algorithms

Day, hour and place

Tuesday at 10:45, Sophie Germain en salle 1016 / Jussieu en salle 15-16 201

Contact(s)

Ismael Castillo
Rafael Pinot
Etienne Roquain
Maxime Sangnier

To add the talks calendar to your agenda, subscribe to this calendar by using this link.

Year 2025

Statistics seminar
Tuesday June 24, 2025, 10:45AM, Jussieu en salle 15-16 201
Emilia Siviero (Università Ca' Foscari) Apprentissage statistique pour les données spatiales et climatiques : processus de Hawkes et valeurs extrêmes

Dans la première partie de cette présentation, nous nous concentrons sur les processus de Hawkes spatio-temporels. De nombreuses données spatio-temporelles, notamment en sociologie ou épidémiologie, présentent des dynamiques auto-excitantes que les processus de Hawkes permettent de modéliser de manière précise et efficace. Pour faire face aux défis posés par les grands volumes de données, nous proposons une méthode d’inférence paramétrique rapide et flexible pour estimer les paramètres de la fonction d’intensité. Notre approche statistique repose sur trois ingrédients clés : (1) l'utilisation de fonctions noyaux à support fini, (2) la discrétisation du domaine spatio-temporel, et (3) des pré-calculs efficaces, éventuellement approximatifs. Nous présentons des expériences numériques sur des données spatio-temporelles, synthétiques et réelles (issues de la sismologie et de la criminologie). Dans une seconde partie, nous présentons des travaux en cours sur l’agrégation des queues de distribution dans les sorties d’ensembles multi-modèles pour la correction de biais. Anticiper avec précision les évolutions futures de la température et des précipitations, par exemple, est essentiel pour les évaluations des impacts du climat. L’agrégation fondée sur les distributions, comme l’approche d’alpha-pooling récemment proposée, permet de surmonter les biais potentiels en combinant et en pondérant statistiquement les différentes sorties. Dans ce travail, nous nous concentrons sur la queue de distribution et utilisons des modèles issus de la théorie des valeurs extrêmes pour évaluer dans quelle mesure les méthodes d’agrégation des sorties des modèles climatiques parviennent à reproduire la distribution des queues observées.

Statistics seminar
Tuesday June 10, 2025, 10:45AM, Jussieu en salle 15-16 201
Toby Dylan Hocking (Université de Sherbrooke) To be announced.

Statistics seminar
Tuesday May 27, 2025, 10:45AM, Sophie Germain en salle 1013
Charles Truong (Centre Borelli, ENS Paris-Saclay) Change Point Detection in Hadamard Spaces by Alternating Minimization

Time series analysis of non-Euclidean data is highly challenging and crucial for many real-world applications. We address the problem of detecting multiple changes in time series within these complex data spaces. Hadamard spaces, which encompass important data spaces like positive semidefinite matrices, certain Wasserstein spaces, and hyperbolic spaces, provide the right general framework to address this complexity. We propose a computationally efficient two-step iterative optimization algorithm called HOP (Hadamard Optimal Partitioning) that detects changes in the sequence of so-called Fréchet means. Under mild conditions, the proposed method consistently estimates the change point locations. HOP is highly versatile, accommodating structural assumptions such as cyclic patterns and epidemic settings, making it unique in the literature. We validate its performance in synthetic and real-world scenarios, including applications in human gait analysis using EMG data with low SNR and behavioral analysis of animal motion.

Kostic, A., Runge, V., & Truong, C. (2025). Change Point Detection in Hadamard Spaces by Alternating Minimization. Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS).

Statistics seminar
Thursday May 22, 2025, 10:45AM, Jussieu en salle 15-16 201
Sophie Langer (Ruhr-Universität Bochum) Statistical Breakthroughs and Novel Perspectives in Deep Learning Theory

Since several years, deep learning has emerged as a transformative field, with its theory involving several disciplines such as approximation theory, statistics and optimization. In this talk we delve into key theoretical breakthroughs, with a particular focus on statistical results. We critically question prevailing frameworks and identify their key limitations. Central to the discussion is a novel statistical framework for image analysis that reinterprets images not as high-dimensional entities, but as structured objects shaped by geometric deformations such as translations, rotations, and scalings. Within this framework, classification is reframed as the task of learning uninformative deformations, leading to convergence rates with more favorable trade-offs between input dimension and sample size. This geometric-statistical perspective not only provides new guarantees for approximation and convergence in deep learning-based image classification but also prompts a rethinking of theoretical approaches for broader prediction problems. In the final part of the talk, we examine the expressive power of ReLU networks in comparison to networks with Heaviside activation functions. While ReLU-based models have become standard in deep learning theory, Heaviside networks offer a compelling alternative that aligns more closely with biologically inspired architectures. We conclude by outlining future research directions and reflecting on the role of theory in the field.

This talk is based on joint work with Juntong Chen, Insung Kong and Johannes Schmidt-Hieber

Statistics seminar
Tuesday May 6, 2025, 10:45AM, Jussieu en salle 15-16 201
Ariane Marandon (Turing Institute) Quantifying predictive uncertainty with conformal prediction: introduction and application to survival analysis

Conformal prediction (CP) is an uncertainty quantification method which works by converting the predictions of arbitrary ML algorithms into prediction sets, guaranteed to contain the outcome with a user-specified probability. Crucially, this approach comes with finite sample guarantees that do not rely on any distributional assumptions besides that data points are independently sampled and identically distributed.

I will start with a high-level introduction to CP in the context of regression/classification tasks. I will then focus on the specific context of survival time prediction, which is a fundamental task in healthcare. A major challenge in this setting is the censoring issue in survival data / analysis; I will introduce and discuss a new approach extending conformal prediction for this task.

Statistics seminar
Tuesday April 8, 2025, 10:45AM, Sophie Germain en salle 1016
Stephan Clémençon (Télécom Paris) To be announced.

Statistics seminar
Tuesday March 25, 2025, 10:45AM, Jussieu en salle 15-16 201
Nicolas Marie (Université Paris Nanterre) Estimation des moindres carrés à partir de copies de la solution d’une équation différentielle dirigée par le mouvement brownien fractionnaire

Cet exposé portera sur un rapide panorama de l’estimation dans les équations différentielles stochastiques (EDS) à partir de copies de la solution, puis se concentrera sur un problème en particulier : l’estimation par moindres carrés (MC) de la fonction de drift d’une EDS dirigée par le mouvement brownien fractionnaire de paramètre de Hurst $H > 1/2$. Un estimateur paramétrique et un estimateur non-paramétrique des MC en projection seront présentés. Lorsque $H\neq 1/2$, la solution (trajectorielle) de l’EDS n’est pas une semi-martingale, et l’extension naturelle de l’intégrale d’Itô qui intervient dans la définition des estimateurs - l’intégrale de Skorokhod - n’est pas calculable. Pour l’estimateur paramétrique, les propriétés statistiques d’une approximation calculable définie comme point fixe d’une application construite à partir de la relation bien connue entre l’intégrale trajectorielle et l’intégrale de Skorokhod seront présentées.

Statistics seminar
Tuesday March 18, 2025, 10:45AM, Sophie Germain en salle 1016
Antonio Silveti-Falls (CentraleSupelec) Training Deep Learning Models with Norm-Constrained LMOs

In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training without any reliance on Adam (up to 3b params). The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision

Statistics seminar
Tuesday March 4, 2025, 10:45AM, Jussieu en salle 15-16 201
Anne Ruiz-Gazen (TSE) An Overview of Outlier Detection Methods in Statistics

Outlier detection is a fundamental problem in statistics and data analysis. In this talk, I will begin by defining what an outlier is—and what it is not—before discussing the key properties expected from outlier detection methods. I will then present the main families of methods, including model-based approaches, distance-based methods, dimension reduction techniques, and robust estimations, highlighting some of their strengths and limitations. I will also provide a brief overview of recent developments in machine learning and computer science related to outlier detection. The choice of method strongly depends on the context and data type. I will illustrate this with examples from my own work on multivariate, compositional, functional, spatial, and survey data, emphasizing in particular dimension reduction techniques.

Statistics seminar
Tuesday February 11, 2025, 10:45AM, Jussieu en salle 15-16 201
Sibylle Marcotte (ENS) Conservation laws for gradient flows

Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This “implicit bias” is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. In this work, we expose the definition and properties of “conservation laws”, that define quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then we explain how to find the exact number of independent conservation laws via Lie algebra computations. This procedure recovers the conservation laws already known for linear and ReLU neural networks for Euclidean gradient flows, and prove that there are no other laws. We identify new laws for certain flows with momentum and/or non-Euclidean geometries. Joint work with Gabriel Peyré and Rémi Gribonval.

Statistics seminar
Tuesday January 28, 2025, 10:45AM, Jussieu en salle 15-16 201
Alexandre Vérine (ENS) Quality and Diversity in Generative Models through the lens of f-divergences

Generative modeling is a fundamental tool of machine learning for creating realistic samples from complex data distributions, with models like GANs, VAEs, Normalizing Flows, and Diffusion models. However, balancing sample quality and diversity remains a challenge, making precision (sample quality) and recall (sample diversity) critical metrics. This work unifies precision and recall within the framework of f-divergences, providing a cohesive evaluation system and introducing a tractable estimation method to optimize their trade-off during model training. Additionally, an optimal rejection sampling technique is proposed to enhance both metrics under computational constraints. Experimental validation on datasets such as MNIST, CIFAR-10, and ImageNet demonstrates the effectiveness of these methods in improving generative model performance.

Statistics seminar
Tuesday January 7, 2025, 10:45AM, Jussieu en salle 15-16 201
Anna Simoni (CNRS, CREST) Panel data models with randomly generated groups: Bayesian inference and density forecasts

We consider a dynamic panel data model that accounts for a latent group structure across individuals which is constant over time. Differently from the previous literature, we adopt a structural modeling that assumes that the individual effects are generated from a finite mixture with an unknown number of components and unknown parameters for each components. We first establish identification of this model. Then, we specify a prior for the number of components, the parameters of the mixture as well as for the coefficients of the dynamic and exogenous covariates. This extends the mixture of finite mixtures model to panel data settings. We establish asymptotic frequentist properties for the posterior of the parameters of interest as well as for the number of components. A Monte Carlo exercise illustrates finite sample properties.

Year 2024

Statistics seminar
Tuesday December 17, 2024, 10:45AM, Sophie Germain en salle 1016
Vincent Brault (LJK) Segmentation du “Parsimonious Oscillatory Model of Handwriting” et application à la détection d'enfants dysgraphiques

La maîtrise de l'écriture manuscrite reste essentielle pour une intégration réussie dans la société, mais elle repose sur un long processus d'apprentissage. Les troubles de l'écriture, appelés dysgraphies, peuvent donc avoir des conséquences graves, de la petite enfance à l'âge adulte. En France, la détection de ces troubles se fait généralement à l'aide du test Brave Handwriting Kinder (ou BHK ; voir voir Hamstra-Bletzet al. (1987) et son adaptation française par Charles et al. (2004))) consistant à faire écrire des enfants pendant 5 minutes et à faire évaluer ce texte selon 13 critères par un spécialiste en psychomotricité. L'un des inconvénients de cette procédure est qu'elle est longue et fastidieuse et qu'un certain nombre d'enfants peuvent ne pas être diagnostiqués.

Pour contourner ce problème, l'une des pistes explorée dans le cadre du post-doc de Yunjiao Lu est de s'appuyer sur le Parsimonious Oscillatory Model of Handwriting (ou modèle POMH ; voir André et al. (2014)) qui part du principe que l'écriture est le résultat de deux oscillateurs orthogonaux composés de fonctions constantes par morceaux. En trouvant les instants où les fonctions changent de valeurs, les auteurs reconstruisent les traces faites par les enfants. Dans son post-doc, Yunjiao Lu montre que l'estimation du nombre et des emplacements des ruptures dans ces fonctions influent sur la reconstruction et semblent varier suivant la qualité de l'écriture (voir Lu et al. (2022)) ; elle essaie notamment d'estimer l'influence des paramètres de filtrage sur l'aide à la prédiction d'un diagnostic de dysgraphie.

Dans cet exposé, nous étudierons une autre piste pour estimer les emplacements de ruptures. Après avoir exposé la problématique, nous montrerons que le modèle POMH peut être vu comme un modèle de segmentation où la programmation dynamique permet d'estimer les emplacements de ruptures. Nous démontrerons également que la forme particulière du modèle permet au maximum de vraisemblance d'être un estimateur consistant de l'emplacement mais surtout du nombre de ruptures. Nous terminerons par une étude de cette modélisation sur la détection de la dysgraphie.

Statistics seminar
Tuesday November 26, 2024, 10:45AM, Jussieu en salle 15-16.201
Patrick Tardivel (Université de Bourguogne) Le chemin des solutions de l’estimateur SLOPE (« Sorted L One Penalized Estimation »)

L’estimateur SLOPE a la particularité d’avoir des composantes nulles (parcimonie) et des composantes égales en valeur absolue (appariement). Le nombre de groupes d’appariement dépend du paramètre de régularisation de l’estimateur. Ce paramètre peut être choisi comme un compromis pour obtenir un estimateur interprétable (en sélectionnant un petit nombre de groupes d’appariement) et précis (avec une faible erreur de prédiction). Trouver un tel compromis nécessite de calculer le chemin des solutions, c’est-à-dire la fonction reliant le paramètre de régularisation à l’estimateur SLOPE. Durant cette présentation j’aborderai quelques résultats théoriques sur le chemin des solutions du SLOPE, j'introduirai une méthode numérique pour résoudre ce chemin et j'illustrerai cette méthode sur un jeu de données réelles.

Statistics seminar
Tuesday October 15, 2024, 9:30AM, Sophie Germain en salle 1016
Waiss Azizian (LJK) What is the long-run distribution of stochastic gradient descent? A large deviations analysis

In this work, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run, (a) the problem's critical region is visited exponentially more often than any non-critical region; (b) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective); © all other connected components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally (d) any component of local maximizers or saddle points is “dominated” by a component of local minimizers which is visited exponentially more often.

This is a joint work with Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Statistics seminar
Tuesday October 1, 2024, 9:30AM, Jussieu en salle 15-16.201
Marc Hoffmann (CEREMADE) Sur l'estimation d'une diffusion multidimensionnelle

Alors que l'estimation des coefficients d'une diffusion scalaire semblait bien comprise (minimax, adaptatif, bayésien non-paramétrique) au début des années 2000, alors que la statistique des semi-martingales s'est résolument tournée vers la finance statistique, ces dernières années voient réapparaître le problème de l'estimation du champ de vecteur de dérive et de la matrice de diffusion pour un processus de diffusion multivarié, notamment sous l'influence de questions de ML et de problèmes inverses bayésiens.

Dans cet exposé, issu de travaux en commun avec Chiara Amorino, Claudia Strauch et aussi Kolyan Ray, nous montrons que si l'on se contente d'un programme théorique non-paramétrique classique (perte L^2, minimax adaptatif à la Lepski, mais pas tellement plus), alors il est possible d'obtenir des résultats relativement généraux qui améliorent en dimension arbitraire ce que l'on connaît, et ceci dans plusieurs directions : pour (i) des observations en temps grand avec pas de discrétisation arbitrairement lent (ii) une réflexion du processus aux bords d'un domaine, mais pas forcément (iii) des situations où la diffusion peut dégénérer, ce qui permet d'inclure des modèles de type position-vitesse ; (iv) dans certains cas (conductivité, schémas rapides) des vitesses de contraction bayésiennes.

L'approche est toujours un peu la même : pour les bornes supérieures, construire une équivalence de modèle par un schéma de régression martingale, découpler les propriétés de concentration du bruit martingale de la “vitesse de remplissage” de l'espace par le “design” (souvent mal connue, ou tout au moins difficile à estimer) ; pour les bornes inférieures, des méthodes perturbatives utilisant un peu de calcul de Malliavin et pour les résultats bayésiens, plus fins, des développements en temps petit du noyau de la chaleur pour une “bonne” géométrie.

Statistics seminar
Tuesday June 18, 2024, 9:30AM, Jussieu en salle 15-16.201
Olga Klopp (ESSEC) Denoising over network with application to partially observed epidemics

We introduce a novel approach to predict epidemic spread over networks using total variation (TV) denoising, a signal processing technique. The study proves the consistency of TV denoising with Bernoulli noise, extending existing bounds from Gaussian noise literature. The methodology is further extended to handle incomplete observations, showcasing its effectiveness. We show that application of 1-bit total variation denoiser improves the prediction accuracy of virus spread dynamics on networks.

Statistics seminar
Tuesday June 4, 2024, 9:30AM, Jussieu en salle 15-16.201
Rémi Boutin (LPSM) The Deep Latent Position Topic Model for network with text data analysis

In numerous numerical interactions, users share textual contents with each other. This may be naturally represented by a network with the nodes characterising the individuals and the edges corresponding to the texts. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a comprehensible visualisation of the data is necessary. To address both issues, we introduce Deep-LPTM, a model-based clustering strategy relying on a variational graph auto-encoder approach as well as a neural topic model, which allows to build a joint representation of the nodes and edges in two embedding spaces. The parameters are inferred using a variational inference. We also propose an original model selection criterion, called IC2L, specifically designed to choose models with relevant clustering and visualisation properties. An extensive benchmark study on synthetic data as well as the analysis of the emails of the Enron company are provided to illustrate the Deep-LPTM ability to cluster the nodes and obtain a meaningful visualisation of the graph structure.

Statistics seminar
Tuesday May 14, 2024, 9:30AM, Jussieu en salle 15-16.201
Rafaël Pinot (LPSM Sorbonne Université) A Small Tutorial on Byzantine-Robustness

The vast amount of data collected every day, combined with the increasing complexity of machine learning models, has led to the emergence of distributed learning schemes. In the now classical Federated learning architecture, the learning procedure consists of multiple data owners (or clients) collaborating to build a global model with the help of a central entity (the server), typically using a distributed variant of SGD. Nevertheless, this algorithm is vulnerable to « misbehaving » clients that could (either intentionally or inadvertently) sabotage the learning by sending arbitrarily bad gradients to the server. These clients are commonly referred to as Byzantine and can model very versatile behaviors going from crashing machines in a datacenter to colluding bots attempting to biais the outcome of a poll on the internet. The purpose of this talk is to present a small introduction the emerging topic of Byzantine-Robustness. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some Byzantine clients. We will take the time to present the setting and review some recent results as well as open problems in the community.

Statistics seminar
Tuesday April 30, 2024, 9:30AM, Jussieu en salle 15-16.201
Spencer Frei (UC Davis) Learning linear models in-context with transformers

Attention-based neural network sequence models such as transformers have the capacity to act as supervised learning algorithms: They can take as input a sequence of labeled examples and output predictions for unlabeled test examples. Indeed, recent work by Garg et al. has shown that when training GPT2 architectures over random instances of linear regression problems, these models' predictions mimic those of ordinary least squares. Towards understanding the mechanisms underlying this phenomenon, we investigate the dynamics of in-context learning of linear predictors for a transformer with a single linear self-attention layer trained by gradient flow. We show that despite the non-convexity of the underlying optimization problem, gradient flow with a random initialization finds a global minimum of the objective function. Moreover, when given a prompt of labeled examples from a new linear prediction task, the trained transformer achieves small prediction error on unlabeled test examples. We further characterize the behavior of the trained transformer under distribution shifts. Talk based on joint work with Ruiqi Zhang and Peter Bartlett.

Bio: Spencer Frei is an Assistant Professor of Statistics at UC Davis. His research is on the foundations of deep learning, including topics related to large language models, benign overfitting, and implicit regularization. Prior to joining UC Davis he was a postdoctoral fellow at UC Berkeley hosted by Peter Bartlett and Bin Yu and received his Ph.D in Statistics at UCLA. He was a co-organizer of a tutorial at NeurIPS 2023 on benign overfitting and of the 2022 Deep Learning Theory Workshop and Summer School at the Simons Institute for the Theory of Computing.

Statistics seminar
Tuesday April 16, 2024, 9:30AM, Jussieu en salle 15-16.201
Borjan Geshkovski (Inria) Une perspective mathématique sur les Transformers

Le Transformer est une architecture de réseaux de neurones profonds introduite en 2017 qui s’est avérée très populaire en traitement automatique des langues. Nous allons voir que cette architecture se modélise tout à fait naturellement en tant qu’un système de particules en interaction sur la sphère unité avec une non-linéarité très particulière, et, pour certains choix de paramètres, est même un flot de gradient pour une énergie d’interaction peu étudiée. Nous allons étudier la convergence en temps long de cette dynamique (qui correspond à étudier les représentations apprises par un Transformer au cours de ses différentes couches) étant donné une configuration initiale arbitraire. Des liens avec des sujets mathématiques bien établis tels que les flots de gradient de Wasserstein, la géométrie combinatoire, et le contrôle, seront faits.

Statistics seminar
Tuesday April 2, 2024, 9:30AM, Jussieu en salle 15-16.201
Anne-Claire Haury (Google/LPSM Sorbonne Université) The Vehicle Routing Problem

* The Vehicle Routing Problem : definition, solutions and (sometimes statistical) challenges In this (casual!) talk, I'll introduce vehicle routing problems (VRP), how we model and solve them, and some of the challenges we encounter. The VRP is an operations research problem initially but in practice it is often stochastic. I'll try to highlight some statistical problems arising. A completely informal (and probably wrong) definition of vehicle routing problems (VRP): any problem that contains vehicles (they don't have to have wheels, they can walk or swim, anything works as long as they're moving) that need to perform tasks (any kind works, from pickup/delivery to services) in predefined places. These problems are very often constrained in different ways (time, capacity, precedences, security, legally…) which of course makes the problems a bit more fun to solve. A bunch of problems can be modeled using this framework, including - but not limited to: parcel delivery, food/immediate delivery, pickups, B-to-B pickup and deliveries, ride sharing, field service management… Some of these, when they are simple and small, may be solved exactly but almost any real-world problem has either a set of features or a size that makes exact solutions impossible. In most cases we have to turn to heuristics, albeit clever and fast. Yet there are many areas of research. I'll tell you more about them ! About me: I'm a new PAST at LPSM. Initially a statistician (PhD in 2012 with JP Vert on feature selection), I changed course 7 years ago to become a software engineer in the Operations Research (OR) team at Google in Paris. I've worked at Google for 10 years, and in the OR team for 7. I'll tell you how I got here in the lab and how I ended up at Google in the first place. You may have heard of OR-tools, an open-sourced lib for OR problems, often used as a benchmark in research papers. Problèmes de tournées de véhicules : définition, solutions et défis (parfois statistiques)

Je vous ferai une introduction informelle des problèmes d'optimisation de tournées de véhicules (vehicle routing problems - VRP), leur modélisation et les manières de les résoudre, ainsi que des pistes de recherche pour répondre à de nouveaux défis apparaissant. Initialement le VRP est un problème de recherche opérationnelle mais en pratique il est très souvent stochastique, je tenterai de présenter quelques sujets sous un angle statistique.

Une définition sans doute pas du tout réglementaire du VRP : il s'agit de l'ensemble des problèmes qui comportent des véhicules (sur roues, sur pieds, à ski… peu importe du moment que ça se déplace) et des tâches à effectuer à des endroits prédéfinis (livraisons, enlèvements, interventions…). Ces problèmes peuvent être accompagnés d'un grand nombre de contraintes - de temps, de capacités, de précédences, de sécurité ou encore de contraintes légales, rendant le problème parfois plus simple, souvent plus dur mais en tout cas plus intéressant à résoudre.

On peut donc modéliser un grand nombre de problèmes de cette façon, dont les plus courants sont la livraison de colis à domicile, la livraison immédiate (livraison de repas par exemple), les enlèvements (poubelles, encombrants), les déplacements de matériels ou de biens entre entreprises, le partage de transports, la gestion des services de terrain (interventions techniques chez des particuliers)…

Certains de ces problèmes, dans leurs versions les plus simples et plus petites peuvent être résolus de manière exacte mais les applications réelles n'entrent pas dans cette catégorie et sont souvent résolues avec des heuristiques dont je vous parlerai. C'est également un problème qui pose perpétuellement de nouvelles questions, je vous en dirai plus sur les sujets non encore résolus.

A propos de moi: Nouvellement recrutée PAST au LPSM, je suis statisticienne de formation (thèse en 2012 avec JP Vert sur la sélection de variables) mais j'ai bifurqué et je travaille depuis 10 ans chez Google à Paris, dont 7 ans dans l'équipe de recherche opérationnelle (RO) en tant qu'ingénieure logiciel. Je vous raconterai comment je suis arrivée ici (et là-bas) et comment on résout des problèmes d'optimisation sous contraintes dans notre équipe de RO (souvent connue de celles et ceux qui utilisent notre librairie open-source OR-tools).

Statistics seminar
Tuesday March 12, 2024, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Guillem Rigaill (INRAE) Online multivariate changepoint detection: leveraging links with computational geometry

In recent years, many methods have been proposed for detecting one or multiple changepoints offline or online in data streams. The reason for such a keen interest in changepoint detection methods lies in its importance for various real-world applications, including bioinformatics, climate and oceanography, econometrics, or finance.

The increasing volume of data streams poses significant computational challenges for detecting changepoints online. Likelihood-based methods are effective, but their straightforward implementation becomes impractical online. We develop two online algorithms that exactly calculate the likelihood ratio test for a single changepoint in p-dimensional data streams modeled with a distribution from the natural exponential family. These algorithms leverage connections with computational geometry. Our first algorithm is straightforward and empirically quasi-linear. The second is more complex but provably quasi-linear: $O(n(log(n))^{p+1})$ for n data points in dimension p. Through simulations, we illustrate, that they are fast and allow us to process millions of points within a matter of minutes up to p = 5.

In this presentation, I will first highlight how we establish the connection between changepoint models and geometry using a functionalization and relaxation argument. Then, I will explain how we derive our algorithms and theoretically bound their expected complexity.

This is joint work with Liudmila Pishchagina, Gaetano Romano, Paul Fearnhead and Vincent Runge https://arxiv.org/abs/2311.01174

Statistics seminar
Tuesday February 27, 2024, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Eugène Ndiaye (Apple) From Conformal Predictions to Confidence Regions

If you predict a label y of a new object with ŷ, how confident are you that ”y = ŷ”? The conformal prediction method provides an elegant framework for answering such a question by establishing a confidence set for an unobserved response of a feature vector based on previous similar observations of responses and features. This is performed without assumptions about the distribution of the data. While providing strong coverage guarantees, computing conformal prediction sets requires adjusting a predictive model to an augmented dataset considering all possible values that the unobserved response can take, and proceeding to select the most likely ones. For a regression problem where y is a continuous variable, it typically requires an infinite number of model fits; which is usually infeasible. By assuming a little more regularity in the underlying prediction models, I will describe some of the techniques that make the calculations feasible. Along similar lines, it can be assumed that we are working with a parametric model that explains the relation between input and output variables. Consequently, a natural question arises as to whether a confidence interval on the ground truth parameter of the model can be constructed, also without assumptions on the distribution of the data. In this presentation, I will provide some preliminary results and discuss remaining open questions.

Statistics seminar
Tuesday January 30, 2024, 9:30AM, Jussieu en salle 15-16.201
Ester Mariucci (Université Versailles Saint Quentin) Nonparametric density estimation for the small jumps of Lévy processes

In this talk we consider the problem of estimating the density of the small jumps of a Lévy process, from discrete observations of one trajectory. We discuss results both from low and high frequency observations, for Lévy processes possibly of infinite variation. We propose an adaptive estimator obtained via a spectral approach which achieves the minimax rate with respect to the L2 loss in the low frequency case. In a high frequency setting, the rate we obtain depends on the sampling scheme and on the Blumenthal-Getoor index of the process. Finally, we discuss the optimality of these results.

This is a joint work with Céline Duval and Taher Jalal.

Statistics seminar
Tuesday January 23, 2024, 9:30AM, Jussieu en salle 15-16.201
Chiara Amorino (University of Luxembourg) Locally differentially private drift parameter estimation for iid paths of diffusion processes

The problem of parameter drift estimation is addressed for $N$ discretely observed iid SDEs, considering the additional constraints that only privatized data can be published and used for inference. The concept of local differential privacy is formally introduced for a system of stochastic differential equations. The aim is to estimate the drift parameter by proposing a contrast function based on a pseudo-likelihood approach. A suitably scaled Laplace noise is incorporated to satisfy the privacy requirement. Our main results consist of deriving explicit conditions on the privacy level for which the associated estimator is proven to be consistent. This holds true as the discretization step approaches zero and the number of processes $N$ tends to infinity. The talk is based on a joint work with A. Gloter and H. Halconruy.

Statistics seminar
Tuesday January 16, 2024, 9:30AM, Jussieu en salle 15-16.201
Sayantan Banerjee (Indian Institute of Management Indore) Precision Matrix Estimation under the Horseshoe-like Prior–Penalty Dual

Precision matrix estimation in a multivariate Gaussian model is fundamental to network estimation. Although there exist both Bayesian and frequentist approaches to this, it is difficult to obtain good Bayesian and frequentist properties under the same prior–penalty dual. To bridge this gap, our contribution is a novel prior–penalty dual that closely approximates the graphical horseshoe prior and penalty, and performs well in both Bayesian and frequentist senses. A chief difficulty with the graphical horseshoe prior is a lack of closed form expression of the density function, which we overcome in this article. In terms of theory, we establish posterior convergence rate of the precision matrix that matches the convergence rate of the frequentist graphical lasso estimator, in addition to the frequentist consistency of the MAP estimator at the same rate. In addition, our results also provide theoretical justifications for previously developed approaches that have been unexplored so far, e.g. for the graphical horseshoe prior.

Computationally efficient EM and MCMC algorithms are developed respectively for the penalized likelihood and fully Bayesian estimation problems. In numerical experiments, the horseshoe-based approaches echo their superior theoretical properties by comprehensively outperforming the competing methods. A protein–protein interaction network estimation in B-cell lymphoma is considered to validate the proposed methodology.

Year 2023

Statistics seminar
Friday December 8, 2023, 9:30AM, Jussieu en salle 16-26-209
Pierre Alquier (ESSEC) Robust estimation and regression with MMD

Maximum likelihood estimation (MLE) enjoys strong optimality properties for statistical estimation, under strong assumptions. However, when these assumptions are not satisfied, MLE can be extremely unreliable. In this talk, we will explore alternative estimators based on the minimization of well chosen distances. In particular, we will see that the Maximum Mean Discrepancy (MMD, based on suitable kernels) leads to estimation procedures that are consistent without any assumption on the model nor on the data-generating process. This leads to strong robustness properties in practice, and this method was already used in complex models with promising results: estimation of SDE coefficients, ccopulas, data compression, generative models in AI…

In the second part of this talk, I will discuss the extension of this method to the estimation of conditional distributions, which allows to use MMD-estimators in various regression models. On the contrary to mean embeddings, very technical conditions are required for the existence of a conditional mean embedding that allows defining an estimator. In most papers, these conditions are often assumed, but rarely checked. It turns out that, in most generalized linear regression models, we proved that these conditions can be met, at the cost of more restrictions on the kernel choice.

This is based on joint works with: Badr-Eddine Chérief-Abdellatif (CNRS, Paris), Mathieu Gerber (University of Bristol), Daniele Durante (Bocconi University), Sirio Legramanti (University of Bergamo), Jean-David Fermanian (ENSAE Paris), Alexis Derumigny (TU Delft), Geoffrey Wolfer (RIKEN-AIP, Tokyo).

Statistics seminar
Tuesday November 21, 2023, 9:30AM, Jussieu en salle 15-16.201
Deborah Sulem (Barcelona School of Economics / Universitat Pompeu Fabra) Bayesian inference for multivariate event data with dependence

Multivariate sequences of events’ such as earthquakes, financial transactions, crimes, and neurons’ activations, can be modelled by temporal point processes. In the Hawkes process model, the probability of occurrences of future events depend on the past of the process and allows to account for dependencies in the data . This model is particularly popular for modelling interactive phenomena such as disease propagation and brain functional connectivity. In this presentation we consider the nonlinear multivariate Hawkes model, which allows to account for excitation and inhibition between interacting entities, estimated with Bayesian nonparametric methods. We will first show that we can provide asymptotic guarantees on such methods, under mild assumptions on the prior distribution and the model. Then, we proposed a variational framework to compute approximations of the posterior distribution, for moderately large to large dimensional data. We also derived similar asymptotic guarantees for such class of approximations, and designed an efficient parallelised variational inference algorithm that leverages sparsity patterns in the dependency structure. Our algorithm is organised in two steps where in the first one, we infer a dependency graph that allows us to reduce the dimensionality of the problem.

Statistics seminar
Tuesday November 7, 2023, 9:30AM, Jussieu en salle 15-16.201
Alberto Suarez (Universidad Autónoma de Madrid) The arrow of time: at the intersection of thermodynamics, machine learning, and causality

The arrow of time refers to the asymmetry in the evolution of physical systems. It is characterized by the second law of thermodynamics. This statistical law states that, in an isolated system, entropy cannot decrease with time and is constant if and only if all processes are reversible. Since microscopic dynamics are reversible, time’s arrow is an emergent property that is apparent only at themeso- and macroscopic levels, both of which involve loss of detail. Machine learning by automatic induction is also an asymmetric dynamical process in which the identification of patterns involves some information loss. The asymmetry of time plays a role also in causal inference: causes precede effects. Finally, causal explanations, which are ubiquitous in human reasoning, are key to rendering machine learning models interpretable. In this talk we will review recent work around these ideas to uncover relations between thermodynamics, machine learning, and causal inference that could provide fruitful insights into the emergence of meaning from raw data.

Statistics seminar
Tuesday October 10, 2023, 9:30AM, Jussieu en salle 15-16.201
Paul Escande On the Concentration of the Minimizers of Empirical Risks

Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning.

Instead of deriving guarantees on the usual estimation error, we will explore concentration inequalities on the distance between the sets of minimizers of the risks. We will argue that for a broad spectrum of estimation problems, there exists a regime where optimal concentration rates can be proven. The bounds will be showcased on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.

Statistics seminar
Thursday September 28, 2023, 9:30AM, Jussieu en salle 15-25.102
Ruth Heller (Tel-Aviv University) Simultaneous Directional Inference

We consider the problem of inference on the signs of n > 1 parameters. We aim to provide 1 − α post-hoc confidence bounds on the number of positive and negative (or non-positive) parameters. The guarantee is simultaneous, for all subsets of parameters. Our suggestion is as follows: start by using the data to select the direction of the hypothesis test for each parameter; then, adjust the p-values of the one-sided hypotheses for the selection, and use the adjusted p-values for simultaneous inference on the selected n one-sided hypotheses. The adjustment is straightforward assuming that the p-values of one-sided hypotheses have densities with monotone likelihood ratio, and are mutually independent. We show that the bounds we provide are tighter (often by a great margin) than existing alternatives, and that they can be obtained by at most a polynomial time. We demonstrate the usefulness of our simultaneous post-hoc bounds in the evaluation of treatment effects across studies or subgroups. Specifically, we provide a tight lower bound on the number of studies which are beneficial, as well as on the number of studies which are harmful (or non beneficial), and in addition conclude on the effect direction of individual studies, while guaranteeing that the probability of at least one wrong inference is at most 0.05.

The relevant paper is arXiv:2301.01653 Joint work with Aldo Solari

Statistics seminar
Tuesday May 30, 2023, 9:30AM, Jussieu en salle 15-16.201
Michael Arbel (INRIA) Non-Convex Bilevel Games with Critical Point Selection Maps

Bilevel optimization problems involve two nested objectives, where an upper-level objective depends on a solution to a lower-level problem. When the latter is non-convex, multiple critical points may be present, leading to an ambiguous definition of the problem. In this paper, we introduce a key ingredient for resolving this ambiguity through the concept of a selection map which allows one to choose a particular solution to the lower-level problem. Using such maps, we define a class of hierarchical games between two agents that resolve the ambiguity in bilevel problems. This new class of games requires introducing new analytical tools in Morse theory to characterize their evolution. In particular, we study the differentiability of the selection, an essential property when analyzing gradient-based algorithms for solving these games. We show that many existing algorithms for bilevel optimization, such as unrolled optimization, solve these games up to approximation errors due to finite computational power. Our analysis allows introducing a simple correction to these algorithms for removing the errors.

Statistics seminar
Thursday May 25, 2023, 9:30AM, Jussieu en salle 15-16.201
Jeffrey Näf (INRIA Montpellier) Distributional Random Forest: Heterogeneity Adjustment and Multivariate Distributional Regression

Random Forest is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data, which can also be used for targets other than the original mean estimation. We propose a novel forest construction for multivariate responses based on their joint conditional distribution, called the Distributional Random Forest (DRF). It uses a new splitting criterion based on the MMD distributional metric, which is suitable for detecting heterogeneity in multivariate distributions. The induced weights define an estimate of the full conditional distribution, which in turn can be used for arbitrary and potentially complicated targets of interest.

Statistics seminar
Tuesday May 23, 2023, 9:30AM, Jussieu en salle 15-16.201
Evguenii Chzhen (Orsay) Demographic parity constraint for algorithmic fairness : a statistical perspective

In this talk I will give a brief introduction to the recently emerged field of algorithmic fairness and advocate for a statistical study of the problem. To support my claims, I will focus on the Demographic Parity fairness constraint, describing various connections to classical statistical theory, optimal transport, and conformal prediction literature. In particular, I will present the form of an optimal prediction function under this constraint in both regression and classification. Then, I will describe a learning procedure which is supported by statistical guarantees under no or mild assumptions on the underlying data distribution.

This talk is based on a sequence of joint works with Ch. Denis, S. Gaucher, M. Hebiri, L. Oneto, M. Pontil, and N. Schreuder.

Statistics seminar
Tuesday May 9, 2023, 9:30AM, Jussieu en salle 15-16.201
Charlotte Dion-Blanc (Sorbonne Université) Classification multi-classes, pour des trajectoires issues de processus de diffusions

Dans cet exposé je présenterai le problème de classification multi-classes, lorsque que les données sont supposées provenir d'un modèle d'équation différentielle stochastique, différent selon la classe, et observées en temps court. J'étudierai en particulier le cas où les classes sont discriminées par le coefficient de dérive de l'équation. Nous verrons les vitesses de convergence pour un classifieur de type plug-in basé sur des estimateurs non-paramétriques des coefficients inconnus

Statistics seminar
Tuesday April 11, 2023, 9:30AM, Sophie Germain en salle 1013
Tabea Rebafka (Sorbonne Université) Model-based graph clustering with an application to ecological networks

We consider the problem of clustering multiple networks into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the data. We present results of our method obtained for a collection of foodwebs in ecology. We illustrate that the method provides relevant clusterings and that the estimated model parameters are highly interpretable and useful in practice.

Statistics seminar
Tuesday March 28, 2023, 9:30AM, Jussieu en salle 15-16.201
David Rossell (Universitat Pompeu Fabra) Statistical inference with external information: high-dimensional data integration

Statistical inference when there are many parameters is a well-studied problem. For example, there are fundamental limits in what types of signals one may learn from data, e.g. given by minimal sample sizes, signal strengths or sparsity conditions. There are many applied problems however where, besides the data directly being analyzed, one has access to external data that one intuitively thinks may help improve inference. Examples include data integration and high-dimensional causal inference methods, where formally incorporating external information is the default and has shown significant practical benefits. We will discuss some of these situations, showcasing the use graphical models related to COVID19 evolution and causal inference methods for gender salary gaps, and provide a theoretical analysis in a simplified Gaussian sequence model setting. The latter shows that, by integrating external information, one may push the theoretical limits of what's possible to learn from data, providing a theoretical justification for this popular applied practice. We will also discuss some practical modelling and computational considerations in formulating Bayesian data analysis methods that provide informative & accurate inference, yet remain computationally tractable.

Statistics seminar
Tuesday March 21, 2023, 9:30AM, Sophie Germain en salle 1013
Cécile Durot (Université Paris Nanterre) To be announced.

Statistics seminar
Thursday March 9, 2023, 9:30AM, Jussieu en salle 16-26.209
Pierre Wolinski (INRIA) Gaussian Pre-Activations in Neural Networks: Myth or Reality?

The study of feature propagation at initialization in neural networks lies at the root of numerous initialization designs. An assumption very commonly made in the field states that the pre-activations are Gaussian. Although this convenient Gaussian hypothesis can be justified when the number of neurons per layer tends to infinity, it is challenged by both theoretical and experimental works for finite-width neural networks. Our major contribution is to construct a family of pairs of activation functions and initialization distributions that ensure that the pre-activations remain Gaussian throughout the network's depth, even in narrow neural networks. In the process, we discover a set of constraints that a neural network should fulfill to ensure Gaussian pre-activations. Additionally, we provide a critical review of the claims of the Edge of Chaos line of works and build an exact Edge of Chaos analysis. We also propose a unified view on pre-activations propagation, encompassing the framework of several well-known initialization procedures. Finally, our work provides a principled framework for answering the much-debated question: is it desirable to initialize the training of a neural network whose pre-activations are ensured to be Gaussian?

Statistics seminar
Thursday February 9, 2023, 9:30AM, Sophie Germain en salle 1016
Vincent Divol (CEREMADE) Estimation d'applications de transport optimal dans des espaces fonctionnels généraux

Nous considérons le problème de l'estimation d'une application de transport optimal entre une loi source P (fixée) et une loi cible inconnue Q, sur la base d'un échantillon de loi Q. Un tel problème a récemment gagné en popularité avec de nouvelles applications en apprentissage automatique, comme les modèles génératifs. Jusqu'à maintenant, des vitesses d'estimations sont connues seulement dans un petit nombre de cas (par exemple, quand P et Q ont des densités majorées et minorées et que l'application de transport appartient à un espace de Hölder), qui sont rarement réalisés en pratique. Nous présentons une méthodologie permettant d'obtenir des vitesses d'estimation de l'application de transport optimal dans sous des hypothèses générales, qui se base sur l'optimisation de la formulation duale du problème de transport empirique. À titre d'exemple, nous donnons des vitesses de convergence dans le cas où P est gaussien et l'application de transport est donnée par un réseau de neurones à deux couches avec un nombre arbitrairement grands de neurones. Collaboration avec Aram-Alexandre Pooladian et Jonathan Niles-Weed

Statistics seminar
Tuesday January 24, 2023, 9:30AM, Jussieu en salle 15-16.201
Laure Sansonnet (INRAE MIA Paris-Saclay) Sélection de variables dans des modèles linéaires (généralisés) multivariés avec dépendance

Dans cet exposé, on s'intéresse au problème de sélection de variables dans deux cadres de modélisation : (i) un modèle linéaire multivarié prenant en compte la dépendance qui peut exister entre les réponses et (ii) un modèle GLARMA multivarié. Dans une première partie, on présentera une procédure de sélection de variables dans le cadre des modèles linéaires multivariés prenant en compte la dépendance qui peut exister entre les réponses. Elle consiste à estimer au préalable la matrice de covariance des réponses qui doit satisfaire certaines hypothèses, et d'utiliser cet estimateur dans un critère Lasso pour obtenir un estimateur parcimonieux de la matrice des coefficients. On illustrera théoriquement et numériquement les bonnes performances de cette méthode appelée MultiVarSel. Dans une seconde partie, après avoir introduit les modèles GLARMA (Generalized Linear Autoregressive Moving Average) multivariés pouvant modéliser des séries temporelles à valeurs discrètes, on proposera une nouvelle approche efficace de sélection de variables dans ces modèles. Elle consiste à combiner itérativement deux étapes : l'estimation des coefficients ARMA et la sélection de variables dans les coefficients de la partie GLM avec des méthodes régularisées. Les bonnes performances de cette approche appelée MultiGlarmaVarSel seront illustrées sur des données synthétiques et sur des données RNA-Seq sur la germination des graines (en collaboration avec Christophe Bailly et Loïc Rajjou).

La première partie est en collaboration avec Julien Chiquet, Céline Lévy-Leduc et Marie Perrot-Dockès et la deuxième partie est en collaboration avec Marina Gomtsyan, Céline Lévy-Leduc et Sarah Ouadah.

Year 2022

Statistics seminar
Tuesday December 6, 2022, 9:30AM, Jussieu en salle 15-16.201
Vianney Perchet (ENSAE) An algorithmic solution to the Blotto game using multi-marginal couplings

We describe an efficient algorithm to compute solutions for the general two-player Blotto game on n battlefields with heterogeneous values. While explicit constructions for such solutions have been limited to specific, largely symmetric or homogeneous, setups, this algorithmic resolution covers the most general situation to date: value-asymmetric game with asymmetric budget. The proposed algorithm rests on recent theoretical advances regarding Sinkhorn iterations for matrix and tensor scaling. An important case which had been out of reach of previous attempts is that of heterogeneous but symmetric battlefield values with asymmetric budget. In this case, the Blotto game is constant-sum so optimal solutions exist, and our algorithm samples from an ε-optimal solution in time O~(n2+ε−4), independently of budgets and battlefield values. In the case of asymmetric values where optimal solutions need not exist but Nash equilibria do, our algorithm samples from an ε-Nash equilibrium with similar complexity but where implicit constants depend on various parameters of the game such as battlefield values.

Statistics seminar
Tuesday November 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Morgane Austern (Harvard University) To split or not to split that is the question: From cross validation to debiased machine learning.

Data splitting is an ubiquitous method in statistics with examples ranging from cross validation to cross-fitting. However, despite its prevalence, theoretical guidance regarding its use is still lacking. In this talk we will explore two examples and establish an asymptotic theory for it. In the first part of this talk, we study the cross-validation method, a ubiquitous method for risk estimation, and establish its asymptotic properties for a large class of models and with an arbitrary number of folds. Under stability conditions, we establish a central limit theorem and Berry-Esseen bounds for the cross-validated risk, which enable us to compute asymptotically accurate confidence intervals. Using our results, we study the statistical speed-up offered by cross validation compared to a train-test split procedure. We reveal some surprising behavior of the cross-validated risk and establish the statistically optimal choice for the number of folds. In the second part of this talk, we study the role of cross fitting in the generalized method of moments with moments that also depend on some auxiliary functions. Recent lines of work show how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-n consistency of the target parameter of interest. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes.

Statistics seminar
Tuesday November 8, 2022, 9:30AM, Jussieu en salle 15-16.201
Arshak Minasyan (CREST-ENSAE) All-In-One Robust Estimator of sub-Gaussian Mean

We propose a robust-to-outliers estimator of the mean of a multivariate Gaussian distribution that enjoys the following properties: polynomial computational complexity, high breakdown point, orthogonal and geometric invariance, minimax rate optimality (up to logarithmic factor) and asymptotical efficiency. Non-asymptotic risk bound for the expected error of the proposed estimator is dimension-free and involves only the effective rank of the covariance matrix. Moreover, we show that the obtained results also hold with high probability and can be extended to the cases of unknown rate of contamination or unknown covariance matrix. In the end, I will also discuss the topic of sparse robust mean estimation in the same framework of adversarial contamination.

Statistics seminar
Thursday October 20, 2022, 11AM, Jussieu en salle 15-16.201
Misha Belkin (University of California) Neural networks, wide and deep, singular kernels and Bayes optimality

Wide and deep neural networks are used in many important practical setting.</span>In this talk I will discuss some aspects of width and depth related to optimization and generalization. I will first discuss what happens when neural networks become infinitely wide, giving a general result for the transition to linearity (i.e., showing that neural networks become linear functions of parameters) for a broad class of wide neural networks corresponding to directed graphs.<br><br>I will then proceed to the question of depth, showing equivalence between infinitely wide and deep fully connected networks trained with gradient descent and Nadaraya-Watson predictors based on certain singular kernels. Using this connection we show that for certain activation functions these wide and deep networks are (asymptotically) optimal for classification but, interestingly, never for regression. Based on joint work with Chaoyue Liu, Adit Radhakrishnan, Caroline Uhler and Libin Zhu.

Statistics seminar
Tuesday October 11, 2022, 9:30AM, Jussieu en salle 15-16.201 et retransmission
Yifan Cui (Zhejiang University) Instrumental Variable Approaches To Individualized Treatment Regimes Under A Counterfactual World

There is fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a robust classification-based instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we establish the identification of both value functions for a given regime and optimal regimes with the aid of a binary instrumental variable when no unmeasured confounding fails to hold. We also construct novel multiply robust classification-based estimators. In addition, we propose to identify and estimate optimal treatment regimes among those who would comply with the assigned treatment under a monotonicity assumption. Furthermore, we consider the problem of individualized treatment regimes under the sign and partial identification. In the former case, i) we provide a necessary and sufficient identification condition of optimal treatment regimes with an instrumental variable; ii) we establish the somewhat surprising result that complier optimal regimes can be consistently estimated without directly collecting compliance information and therefore without the compiler average treatment effect itself being identified. In the latter case, we establish a formal link between individualized decision making under partial identification and classical decision theory under uncertainty through a unified lower bound perspective.

Statistics seminar
Tuesday September 27, 2022, 9:30AM, Jussieu en salle 15-16.201
Emilie Kaufmann (CNRS) Exploration non paramétrique dans les modèles de bandits

Dans un modèle de bandit, un agent sélectionne de manière séquentielle des “bras”, qui sont des lois de probabilité initialement inconnues de l’agent, dans le but de maximiser la somme des échantillons obtenus, qui sont vus comme des récompenses. Les algorithmes de bandits les plus populaires sont basés sur la construction d’intervalles de confiance ou l’échantillonnage d’une loi a posteriori, mais ne peuvent atteindre des performances optimales qu’en ayant des connaissances a priori sur la famille de distributions des bras. Dans cet exposé nous allons présenter des approches alternatives basées sur du ré-échantillonnage de l’historique de chaque bras. De tels algorithmes peuvent s’avérer plus robustes en deux sens. Nous verrons qu’ils peuvent être optimaux pour différentes classes de distributions, et être aisément adaptés à des situations où le critère de performance n’est pas lié à la récompense moyenne de l’agent, mais prend en compte une mesure de risque.

Statistics seminar
Tuesday May 31, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Elsa Cazelles (IRIT) A novel notion of barycenter for probability distributions based on optimal weak mass transport

We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass. We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter is illustrated on several examples.

Statistics seminar
Tuesday May 10, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Guillaume Lecué (CREST) A geometrical viewpoint on the benign overfitting property of the minimum $\ell_2$-norm interpolant estimator.

Practitioners have observed that some deep learning models generalize well even with a perfect fit to noisy training data [1,2]. Since then many theoretical works have revealed some facets of this phenomenon [3,4,5] known as benign overfitting. In particular, in the linear regression model, the minimum l_2-norm interpolant estimator \hat\bbeta has received a lot of attention [3,4,6] since it was proved to be consistent even though it perfectly fits noisy data under some condition on the covariance matrix \Sigma of the input vector. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [6]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [4]: \hat\bbeta can be written as a sum of a ridge estimator \hat\bbeta_{1:k} and an overfitting component \hat\bbeta_{k+1:p} which follows a decomposition of the features space \bR^p=V_{1:k}\oplus^\perp V_{k+1:p} into the space V_{1:k} spanned by the top k eigenvectors of \Sigma and the one V_{k+1:p} spanned by the p-k last ones. We also prove a matching lower bound for the expected prediction risk. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint coincides with the effective rank from [3,6] and is the key tool to handle the behavior of the design matrix restricted to the sub-space V_{k+1:p} where overfitting happens. (Joint work with Zong Shang).

[1] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116(32):15849–15854, 2019.

[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64(3):107–115, 2021.

[3] Peter L. Bartlett, Philip M. Long, Gabor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117(48):30063–30070, 2020.

[4] Peter L. Bartlett, Andreas Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. To appear in Acta Numerica, 2021.

[5] Mikhail Belkin. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. To appear in Acta Numerica, 2021.

[6] Alexander Tsigler and Peter L. Bartlett. Benign overfitting in ridge regression. 2021.

Statistics seminar
Tuesday April 19, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Clément Marteau (Université Lyon 1) Supermix : régularisation parcimonieuse pour des modèles de mélange

Cet exposé s'intéresse à l'estimation d'une mesure de probabilité discrète $\mu_0$ impliquée dans un modèle de mélange. Utilisant des résultats récents en régularisation l1 sur l'espace des mesures, nous considérerons un problème d'optimisation convexe pour l'estimation de $\mu_0$ sans faire appel à l'utilisation d'une grille. Le traitement de ce problème d'optimisation nécessite l'introduction d'un certificat dual. Nous discuterons ensuite les propriétés statistiques de l'estimateur obtenu en s'intéressant en particulier au cas gaussien.

Statistics seminar
Tuesday April 5, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Fabrice Grela (Université de Nantes) Minimax detection and localisation of an abrupt change in a Poisson process

Considering a Poisson process observed on a bounded, fixed interval, we are interested in the problem of detecting an abrupt change in its distribution, characterized by a jump in its intensity. Formulated as an off-line change-point problem, we address two questions : the one of detecting a change-point and the one of estimating the jump location of such change-point. This study aims at proposing a non-asymptotic minimax testing set-up, first to construct a minimax and adaptive detection procedure and then to give a minimax study of a multiple testing procedure designed for simultaneously detect and localise a change-point.

Statistics seminar
Tuesday March 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Aymeric Dieuleveut (Polytechnique) Federated Learning and optimization: from a gentle introduction to recent results

In this presentation, I will present some results on optimization in the context of federated learning. I will summarise the main challenges and the type of results people have been interested in, and dive into some more recent results on tradeoffs between (bidirectional) compression, communication, privacy and user-heterogeneity. The presentation will be based on recent work with Constantin Philippenko, Maxence Noble, Aurélien Bellet.

Refs:Mainly: Differentially Private Federated Learning on Heterogeneous Data, M Noble, A Bellet, A Dieuleveut, Aistats 2022, Link Preserved central model for faster bidirectional compression in distributed settings C Philippenko, A Dieuleveut, Neurips 2021 LinkIf time allows it (unlikely): Federated Expectation Maximization with heterogeneity mitigation and variance reduction, A Dieuleveut, G Fort, E Moulines, G Robin, Neurips 2021 Link

Statistics seminar
Tuesday March 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Lihua Lei (Stanford University) Testing for outliers with conformal p-values

We study the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework that yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

Statistics seminar
Tuesday February 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Élisabeth Gassiat Deconvolution with unknown noise distribution

I consider the deconvolution problem in the case where no information is known about the noise distribution. More precisely, no assumption is made on the noise distribution and no samples are available to estimate it: the deconvolution problem is solved based only on observations of the corrupted signal. I will prove the identifiability of the model up to translation when the signal has a Laplace transform with an exponential growth $\rho$ smaller than 2 and when it can be decomposed into two dependent components, so that the identifiability theorem can be used for sequences of dependent data or for sequences of iid multidimensional data. In the case of iid multidimensional data, I will propose an adaptive estimator of the density of the signal and provide rates of convergence. This rate of convergence is known to be minimax when ρ = 1.

Statistics seminar
Tuesday January 25, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Nicolas Verzelen (Université de Montpellier) Optimal ranking in crowd-sourcing problem

Consider a crowd sourcing problem where we have n experts and d tasks. The average ability of each expert for each task is stored in an unknown matrix M, from which we have incomplete and noise observations. We make no (semi) parametric assumptions, but assume that both experts and tasks can be perfectly ordered: so that if an expert A is better than an expert B, the ability of A is higher than that of B for all tasks - and that the same holds for the tasks. This implies that if the matrix M, up to permutations of its rows and columns, is bi-isotonic. We focus on the problem of recovering the optimal ranking of the experts in l2 norm, when the ordering of the tasks is known to the statistician. In other words, we aim at estimating the suitable permutation of the rows of M while the permutation of the columns is known. We provide a minimax-optimal and computationally feasible method for this problem, based on hierarchical clustering, PCA, change-point detection, and exchange of informations among the clusters. We prove in particular - in the case where d > n - that the problem of estimating the expert ranking is significantly easier than the problem of estimating the matrix M.

This talk is based on a joint ongoing work with Alexandra Carpentier and Emmanuel Pilliat.

Year 2021

Statistics seminar
Tuesday December 14, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Julie Delon (Université de Paris) Some perspectives on stochastic models for Bayesian image restoration

Random image models are central for solving inverse problems in imaging. In a Bayesian formalism, these models can be used as priors or regularisers and combined to an explicit likelihood function to define posterior distributions. Most of the time, these posterior distributions are used to derive Maximum A Posteriori (MAP) estimators, leading to optimization problems that may be convex or not, but are well studied and understood. Sampling schemes can also be used to explore these posterior distributions, to derive Minimum Mean Square Error (MMSE) estimators, quantify uncertainty or perform other advanced inferences. While research on inverse problems has focused for many years on explicit image models (either directly in the image space, or in a transformed space), an important trend nowadays is to use implicit image models encoded by neural networks. This opens the way to restoration algorithms that exploit more powerful and accurate prior models for natural images but raises novel challenges and questions on the corresponding posterior distributions and their resulting estimators. The goal of this presentation is to provide some perspectives and present recent developments on these questions.

Statistics seminar
Tuesday November 30, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Frédéric Chazal (INRIA) A framework to differentiate persistent homology with applications in Machine Learning and Statistics

Understanding the differentiable structure of persistent homology and solving optimization tasks based on functions and losses with a topological flavor is a very active, growing field of research in data science and Topological Data Analysis, with applications in non-convex optimization, statistics and machine learning.

However, the approaches proposed in the literature are usually

anchored to a specific application and/or topological construction, and do not come with theoretical guarantees.

In this talk, we will study the differentiability of a general map associated with the most common topological construction, that is, the persistence map. Building on real analytic geometry arguments, we propose a general framework that allows to define and compute gradients for persistence-based functions in a very simple way. As an application, we also provide a simple, explicit and sufficient condition for convergence of stochastic subgradient methods for such functions. If time permits, as another application, we will also show how this framework combined with standard geometric measure theory arguments leads to results on the statistical behavior of persistence diagrams of filtrations built on top of random point clouds.

Statistics seminar
Tuesday November 23, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Yannick Baraud (Université de Luxembourg) Comment construire des lois a posteriori robustes à partir de tests ?

Les estimateurs bayésiens classiques, tout comme ceux bâtis à partir de la vraisemblance, ont de bonnes qualités d’estimation lorsque le modèle statistique est exact et rend parfaitement compte de la loi des données. Dans le cas contraire, lorsque le modèle n’est qu’approximatif, ces estimateurs peuvent devenir terriblement mauvais et il suffit parfois d’une seule donnée aberrante au regard du modèle utilisé pour que cela advienne. Nous montrerons comment remédier à ce problème d’instabilité en proposant dans le cadre bayésien une nouvelle loi a posteriori construite à partir de tests robustes convenables. Nous verrons comment cette approche fournit des estimateurs à la fois optimaux lorsque le modèle est exact et stables à une légère erreur de modélisation.

Statistics seminar
Tuesday November 9, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Alessandro Rudi (INRIA) PSD models for Non-convex optimization and beyond

In this talk we present a rather flexible and expressive model for non-negative functions. We will show direct applications in probability representation and non-convex optimization. In particular, the model allows to derive an algorithm for non-convex optimization that is adaptive to the degree of differentiability of the objective function and achieves optimal rates of convergence. Finally we show how to apply the same technique to other interesting problems in applied mathematics that can be easily expressed in terms of inequalities.

Statistics seminar
Tuesday October 19, 2021, 9:30AM, Sophie Germain en salle 1013
Antoine Marchina (Université de Paris) Concentration inequalities for suprema of unbounded empirical processes

In this talk, we will provide new concentration inequalities for suprema of (possibly) non-centered and unbounded empirical processes associated with independent and identically distributed random variables. In particular, we establish Fuk-Nagaev type inequalities with the optimal constant in the moderate deviation bandwidth. We will also explain the use of these results in statistical applications (ongoing research)

Statistics seminar
Tuesday October 5, 2021, 9:30AM, Jussieu en salle 15-16.201
Judith Rousseau (Oxford) Semiparametric and nonparametric Bayesian inference in hidden Markov models

In this work we are interested in inference in Hidden Markov models with infinite state space and nonparametric emission distributions. Since the seminal paper of Gassiat et al. (2016), it is known that in such models the transition matrix Q and the emission distributions F1; … ; FK are identifiable, up to label switching. We propose an (almost) Bayesian method to simultaneously estimate Q at the rate sqrt(n) and the emission distributions at the usual nonparametric rates. To do so, we first consider a prior pi1 on Q and F1; … ; Fk which leads to a posterior marginal distribution on Q which verifies the Bernstein von mises property and thus to an estimator of Q which is efficient. We then combine the marginal posterior on Q with an other posterior distribution on the emission distributions, following the cut-posterior approach, to obtain a posterior which also concentrates around the emission distributions at the minimax rates. In addition an important intermediate result of our work is an inversion inequality which allows to upper bound the L1 norms between the emission densities by the L1 norms between marginal densities of 3 consecutive observations.

Joint work with D. Moss (Oxford).