#### Day, hour and place

Tuesday at 09:30, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201

#### Contact(s)

Statistics seminar
Tuesday December 6, 2022, 9:30AM, Jussieu en salle 15-16.201
Vianney Perchet (ENSAE) To be announced.

#### Talks calendar

To add the talks calendar to your agenda, subscribe to this calendar by using this link.

#### Year 2022

Statistics seminar
Tuesday November 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Morgane Austern (Harvard University) To split or not to split that is the question: From cross validation to debiased machine learning.

Data splitting is an ubiquitous method in statistics with examples ranging from cross validation to cross-fitting. However, despite its prevalence, theoretical guidance regarding its use is still lacking. In this talk we will explore two examples and establish an asymptotic theory for it. In the first part of this talk, we study the cross-validation method, a ubiquitous method for risk estimation, and establish its asymptotic properties for a large class of models and with an arbitrary number of folds. Under stability conditions, we establish a central limit theorem and Berry-Esseen bounds for the cross-validated risk, which enable us to compute asymptotically accurate confidence intervals. Using our results, we study the statistical speed-up offered by cross validation compared to a train-test split procedure. We reveal some surprising behavior of the cross-validated risk and establish the statistically optimal choice for the number of folds. In the second part of this talk, we study the role of cross fitting in the generalized method of moments with moments that also depend on some auxiliary functions. Recent lines of work show how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-n consistency of the target parameter of interest. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes.

Statistics seminar
Tuesday November 8, 2022, 9:30AM, Jussieu en salle 15-16.201
Arshak Minasyan (CREST-ENSAE) All-In-One Robust Estimator of sub-Gaussian Mean

We propose a robust-to-outliers estimator of the mean of a multivariate Gaussian distribution that enjoys the following properties: polynomial computational complexity, high breakdown point, orthogonal and geometric invariance, minimax rate optimality (up to logarithmic factor) and asymptotical efficiency. Non-asymptotic risk bound for the expected error of the proposed estimator is dimension-free and involves only the effective rank of the covariance matrix. Moreover, we show that the obtained results also hold with high probability and can be extended to the cases of unknown rate of contamination or unknown covariance matrix. In the end, I will also discuss the topic of sparse robust mean estimation in the same framework of adversarial contamination.

Statistics seminar
Thursday October 20, 2022, 11AM, Jussieu en salle 15-16.201
Misha Belkin (University of California) Neural networks, wide and deep, singular kernels and Bayes optimality

Wide and deep neural networks are used in many important practical setting.</span>In this talk I will discuss some aspects of width and depth related to optimization and generalization. I will first discuss what happens when neural networks become infinitely wide, giving a general result for the transition to linearity (i.e., showing that neural networks become linear functions of parameters) for a broad class of wide neural networks corresponding to directed graphs.<br><br>I will then proceed to the question of depth, showing equivalence between infinitely wide and deep fully connected networks trained with gradient descent and Nadaraya-Watson predictors based on certain singular kernels. Using this connection we show that for certain activation functions these wide and deep networks are (asymptotically) optimal for classification but, interestingly, never for regression. Based on joint work with Chaoyue Liu, Adit Radhakrishnan, Caroline Uhler and Libin Zhu.

Statistics seminar
Tuesday October 11, 2022, 9:30AM, Jussieu en salle 15-16.201 et retransmission
Yifan Cui (Zhejiang University) Instrumental Variable Approaches To Individualized Treatment Regimes Under A Counterfactual World

There is fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a robust classification-based instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we establish the identification of both value functions for a given regime and optimal regimes with the aid of a binary instrumental variable when no unmeasured confounding fails to hold. We also construct novel multiply robust classification-based estimators. In addition, we propose to identify and estimate optimal treatment regimes among those who would comply with the assigned treatment under a monotonicity assumption. Furthermore, we consider the problem of individualized treatment regimes under the sign and partial identification. In the former case, i) we provide a necessary and sufficient identification condition of optimal treatment regimes with an instrumental variable; ii) we establish the somewhat surprising result that complier optimal regimes can be consistently estimated without directly collecting compliance information and therefore without the compiler average treatment effect itself being identified. In the latter case, we establish a formal link between individualized decision making under partial identification and classical decision theory under uncertainty through a unified lower bound perspective.

Statistics seminar
Tuesday September 27, 2022, 9:30AM, Jussieu en salle 15-16.201
Emilie Kaufmann (CNRS) Exploration non paramétrique dans les modèles de bandits

Dans un modèle de bandit, un agent sélectionne de manière séquentielle des “bras”, qui sont des lois de probabilité initialement inconnues de l’agent, dans le but de maximiser la somme des échantillons obtenus, qui sont vus comme des récompenses. Les algorithmes de bandits les plus populaires sont basés sur la construction d’intervalles de confiance ou l’échantillonnage d’une loi a posteriori, mais ne peuvent atteindre des performances optimales qu’en ayant des connaissances a priori sur la famille de distributions des bras. Dans cet exposé nous allons présenter des approches alternatives basées sur du ré-échantillonnage de l’historique de chaque bras. De tels algorithmes peuvent s’avérer plus robustes en deux sens. Nous verrons qu’ils peuvent être optimaux pour différentes classes de distributions, et être aisément adaptés à des situations où le critère de performance n’est pas lié à la récompense moyenne de l’agent, mais prend en compte une mesure de risque.

Statistics seminar
Tuesday May 31, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Elsa Cazelles (IRIT) A novel notion of barycenter for probability distributions based on optimal weak mass transport

We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass. We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter is illustrated on several examples.

Statistics seminar
Tuesday May 10, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Guillaume Lecué (CREST) A geometrical viewpoint on the benign overfitting property of the minimum $\ell_2$-norm interpolant estimator.

Practitioners have observed that some deep learning models generalize well even with a perfect fit to noisy training data [1,2]. Since then many theoretical works have revealed some facets of this phenomenon [3,4,5] known as benign overfitting. In particular, in the linear regression model, the minimum l_2-norm interpolant estimator \hat\bbeta has received a lot of attention [3,4,6] since it was proved to be consistent even though it perfectly fits noisy data under some condition on the covariance matrix \Sigma of the input vector. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [6]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [4]: \hat\bbeta can be written as a sum of a ridge estimator \hat\bbeta_{1:k} and an overfitting component \hat\bbeta_{k+1:p} which follows a decomposition of the features space \bR^p=V_{1:k}\oplus^\perp V_{k+1:p} into the space V_{1:k} spanned by the top k eigenvectors of \Sigma and the one V_{k+1:p} spanned by the p-k last ones. We also prove a matching lower bound for the expected prediction risk. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint coincides with the effective rank from [3,6] and is the key tool to handle the behavior of the design matrix restricted to the sub-space V_{k+1:p} where overfitting happens. (Joint work with Zong Shang).

[1] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116(32):15849–15854, 2019.

[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64(3):107–115, 2021.

[3] Peter L. Bartlett, Philip M. Long, Gabor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117(48):30063–30070, 2020.

[4] Peter L. Bartlett, Andreas Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. To appear in Acta Numerica, 2021.

[5] Mikhail Belkin. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. To appear in Acta Numerica, 2021.

[6] Alexander Tsigler and Peter L. Bartlett. Benign overfitting in ridge regression. 2021.

Statistics seminar
Tuesday April 19, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Clément Marteau (Université Lyon 1) Supermix : régularisation parcimonieuse pour des modèles de mélange

Cet exposé s'intéresse à l'estimation d'une mesure de probabilité discrète $\mu_0$ impliquée dans un modèle de mélange. Utilisant des résultats récents en régularisation l1 sur l'espace des mesures, nous considérerons un problème d'optimisation convexe pour l'estimation de $\mu_0$ sans faire appel à l'utilisation d'une grille. Le traitement de ce problème d'optimisation nécessite l'introduction d'un certificat dual. Nous discuterons ensuite les propriétés statistiques de l'estimateur obtenu en s'intéressant en particulier au cas gaussien.

Statistics seminar
Tuesday April 5, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Fabrice Grela (Université de Nantes) Minimax detection and localisation of an abrupt change in a Poisson process

Considering a Poisson process observed on a bounded, fixed interval, we are interested in the problem of detecting an abrupt change in its distribution, characterized by a jump in its intensity. Formulated as an off-line change-point problem, we address two questions : the one of detecting a change-point and the one of estimating the jump location of such change-point. This study aims at proposing a non-asymptotic minimax testing set-up, first to construct a minimax and adaptive detection procedure and then to give a minimax study of a multiple testing procedure designed for simultaneously detect and localise a change-point.

Statistics seminar
Tuesday March 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Aymeric Dieuleveut (Polytechnique) Federated Learning and optimization: from a gentle introduction to recent results

In this presentation, I will present some results on optimization in the context of federated learning. I will summarise the main challenges and the type of results people have been interested in, and dive into some more recent results on tradeoffs between (bidirectional) compression, communication, privacy and user-heterogeneity. The presentation will be based on recent work with Constantin Philippenko, Maxence Noble, Aurélien Bellet.

Refs:Mainly: Differentially Private Federated Learning on Heterogeneous Data, M Noble, A Bellet, A Dieuleveut, Aistats 2022, Link Preserved central model for faster bidirectional compression in distributed settings C Philippenko, A Dieuleveut, Neurips 2021 LinkIf time allows it (unlikely): Federated Expectation Maximization with heterogeneity mitigation and variance reduction, A Dieuleveut, G Fort, E Moulines, G Robin, Neurips 2021 Link

Statistics seminar
Tuesday March 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Lihua Lei (Stanford University) Testing for outliers with conformal p-values

We study the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework that yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

Statistics seminar
Tuesday February 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Élisabeth Gassiat Deconvolution with unknown noise distribution

I consider the deconvolution problem in the case where no information is known about the noise distribution. More precisely, no assumption is made on the noise distribution and no samples are available to estimate it: the deconvolution problem is solved based only on observations of the corrupted signal. I will prove the identifiability of the model up to translation when the signal has a Laplace transform with an exponential growth $\rho$ smaller than 2 and when it can be decomposed into two dependent components, so that the identifiability theorem can be used for sequences of dependent data or for sequences of iid multidimensional data. In the case of iid multidimensional data, I will propose an adaptive estimator of the density of the signal and provide rates of convergence. This rate of convergence is known to be minimax when ρ = 1.

Statistics seminar
Tuesday January 25, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Nicolas Verzelen (Université de Montpellier) Optimal ranking in crowd-sourcing problem

Consider a crowd sourcing problem where we have n experts and d tasks. The average ability of each expert for each task is stored in an unknown matrix M, from which we have incomplete and noise observations. We make no (semi) parametric assumptions, but assume that both experts and tasks can be perfectly ordered: so that if an expert A is better than an expert B, the ability of A is higher than that of B for all tasks - and that the same holds for the tasks. This implies that if the matrix M, up to permutations of its rows and columns, is bi-isotonic. We focus on the problem of recovering the optimal ranking of the experts in l2 norm, when the ordering of the tasks is known to the statistician. In other words, we aim at estimating the suitable permutation of the rows of M while the permutation of the columns is known. We provide a minimax-optimal and computationally feasible method for this problem, based on hierarchical clustering, PCA, change-point detection, and exchange of informations among the clusters. We prove in particular - in the case where d > n - that the problem of estimating the expert ranking is significantly easier than the problem of estimating the matrix M.

This talk is based on a joint ongoing work with Alexandra Carpentier and Emmanuel Pilliat.

#### Year 2021

Statistics seminar
Tuesday December 14, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Julie Delon (Université de Paris) Some perspectives on stochastic models for Bayesian image restoration

Random image models are central for solving inverse problems in imaging. In a Bayesian formalism, these models can be used as priors or regularisers and combined to an explicit likelihood function to define posterior distributions. Most of the time, these posterior distributions are used to derive Maximum A Posteriori (MAP) estimators, leading to optimization problems that may be convex or not, but are well studied and understood. Sampling schemes can also be used to explore these posterior distributions, to derive Minimum Mean Square Error (MMSE) estimators, quantify uncertainty or perform other advanced inferences. While research on inverse problems has focused for many years on explicit image models (either directly in the image space, or in a transformed space), an important trend nowadays is to use implicit image models encoded by neural networks. This opens the way to restoration algorithms that exploit more powerful and accurate prior models for natural images but raises novel challenges and questions on the corresponding posterior distributions and their resulting estimators. The goal of this presentation is to provide some perspectives and present recent developments on these questions.

Statistics seminar
Tuesday November 30, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Frédéric Chazal (INRIA) A framework to differentiate persistent homology with applications in Machine Learning and Statistics

Understanding the differentiable structure of persistent homology and solving optimization tasks based on functions and losses with a topological flavor is a very active, growing field of research in data science and Topological Data Analysis, with applications in non-convex optimization, statistics and machine learning.

However, the approaches proposed in the literature are usually

anchored to a specific application and/or topological construction, and do not come with theoretical guarantees.

In this talk, we will study the differentiability of a general map associated with the most common topological construction, that is, the persistence map. Building on real analytic geometry arguments, we propose a general framework that allows to define and compute gradients for persistence-based functions in a very simple way. As an application, we also provide a simple, explicit and sufficient condition for convergence of stochastic subgradient methods for such functions. If time permits, as another application, we will also show how this framework combined with standard geometric measure theory arguments leads to results on the statistical behavior of persistence diagrams of filtrations built on top of random point clouds.

Statistics seminar
Tuesday November 23, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Yannick Baraud (Université de Luxembourg) Comment construire des lois a posteriori robustes à partir de tests ?

Les estimateurs bayésiens classiques, tout comme ceux bâtis à partir de la vraisemblance, ont de bonnes qualités d’estimation lorsque le modèle statistique est exact et rend parfaitement compte de la loi des données. Dans le cas contraire, lorsque le modèle n’est qu’approximatif, ces estimateurs peuvent devenir terriblement mauvais et il suffit parfois d’une seule donnée aberrante au regard du modèle utilisé pour que cela advienne. Nous montrerons comment remédier à ce problème d’instabilité en proposant dans le cadre bayésien une nouvelle loi a posteriori construite à partir de tests robustes convenables. Nous verrons comment cette approche fournit des estimateurs à la fois optimaux lorsque le modèle est exact et stables à une légère erreur de modélisation.

Statistics seminar
Tuesday November 9, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Alessandro Rudi (INRIA) PSD models for Non-convex optimization and beyond

In this talk we present a rather flexible and expressive model for non-negative functions. We will show direct applications in probability representation and non-convex optimization. In particular, the model allows to derive an algorithm for non-convex optimization that is adaptive to the degree of differentiability of the objective function and achieves optimal rates of convergence. Finally we show how to apply the same technique to other interesting problems in applied mathematics that can be easily expressed in terms of inequalities.

Statistics seminar
Tuesday October 19, 2021, 9:30AM, Sophie Germain en salle 1013
Antoine Marchina (Université de Paris) Concentration inequalities for suprema of unbounded empirical processes

In this talk, we will provide new concentration inequalities for suprema of (possibly) non-centered and unbounded empirical processes associated with independent and identically distributed random variables. In particular, we establish Fuk-Nagaev type inequalities with the optimal constant in the moderate deviation bandwidth. We will also explain the use of these results in statistical applications (ongoing research)

Statistics seminar
Tuesday October 5, 2021, 9:30AM, Jussieu en salle 15-16.201
Judith Rousseau (Oxford) Semiparametric and nonparametric Bayesian inference in hidden Markov models

In this work we are interested in inference in Hidden Markov models with infinite state space and nonparametric emission distributions. Since the seminal paper of Gassiat et al. (2016), it is known that in such models the transition matrix Q and the emission distributions F1; … ; FK are identi fiable, up to label switching. We propose an (almost) Bayesian method to simultaneously estimate Q at the rate sqrt(n) and the emission distributions at the usual nonparametric rates. To do so, we first consider a prior pi1 on Q and F1; … ; Fk which leads to a posterior marginal distribution on Q which veri fies the Bernstein von mises property and thus to an estimator of Q which is efficient. We then combine the marginal posterior on Q with an other posterior distribution on the emission distributions, following the cut-posterior approach, to obtain a posterior which also concentrates around the emission distributions at the minimax rates. In addition an important intermediate result of our work is an inversion inequality which allows to upper bound the L1 norms between the emission densities by the L1 norms between marginal densities of 3 consecutive observations.

Joint work with D. Moss (Oxford).