Statistics seminar
Tuesday December 6, 2022, 9:30AM, Jussieu en salle 15-16.201
Vianney Perchet (ENSAE) An algorithmic solution to the Blotto game using multi-marginal couplings
We describe an efficient algorithm to compute solutions for the general two-player Blotto game on n battlefields with heterogeneous values. While explicit constructions for such solutions have been limited to specific, largely symmetric or homogeneous, setups, this algorithmic resolution covers the most general situation to date: value-asymmetric game with asymmetric budget. The proposed algorithm rests on recent theoretical advances regarding Sinkhorn iterations for matrix and tensor scaling. An important case which had been out of reach of previous attempts is that of heterogeneous but symmetric battlefield values with asymmetric budget. In this case, the Blotto game is constant-sum so optimal solutions exist, and our algorithm samples from an ε-optimal solution in time O~(n2+ε−4), independently of budgets and battlefield values. In the case of asymmetric values where optimal solutions need not exist but Nash equilibria do, our algorithm samples from an ε-Nash equilibrium with similar complexity but where implicit constants depend on various parameters of the game such as battlefield values.
Statistics seminar
Tuesday November 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Morgane Austern (Harvard University) To split or not to split that is the question: From cross validation to debiased machine learning.
Data splitting is an ubiquitous method in statistics with examples ranging from cross validation to cross-fitting. However, despite its prevalence, theoretical guidance regarding its use is still lacking. In this talk we will explore two examples and establish an asymptotic theory for it.
In the first part of this talk, we study the cross-validation method, a ubiquitous method for risk estimation, and establish its asymptotic properties for a large class of models and with an arbitrary number of folds. Under stability conditions, we establish a central limit theorem and Berry-Esseen bounds for the cross-validated risk, which enable us to compute asymptotically accurate confidence intervals. Using our results, we study the statistical speed-up offered by cross validation compared to a train-test split procedure. We reveal some surprising behavior of the cross-validated risk and establish the statistically optimal choice for the number of folds.
In the second part of this talk, we study the role of cross fitting in the generalized method of moments with moments that also depend on some auxiliary functions. Recent lines of work show how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-n consistency of the target parameter of interest. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes.
Statistics seminar
Tuesday November 8, 2022, 9:30AM, Jussieu en salle 15-16.201
Arshak Minasyan (CREST-ENSAE) All-In-One Robust Estimator of sub-Gaussian Mean
We propose a robust-to-outliers estimator of the mean of a multivariate Gaussian distribution that enjoys the following properties: polynomial computational complexity, high breakdown point, orthogonal and geometric invariance, minimax rate optimality (up to logarithmic factor) and asymptotical efficiency. Non-asymptotic risk bound for the expected error of the proposed estimator is dimension-free and involves only the effective rank of the covariance matrix. Moreover, we show that the obtained results also hold with high probability and can be extended to the cases of unknown rate of contamination or unknown covariance matrix. In the end, I will also discuss the topic of sparse robust mean estimation in the same framework of adversarial contamination.
Statistics seminar
Thursday October 20, 2022, 11AM, Jussieu en salle 15-16.201
Misha Belkin (University of California) Neural networks, wide and deep, singular kernels and Bayes optimality
Wide and deep neural networks are used in many important practical setting.</span>In this talk I will discuss some aspects of width and depth related to optimization and generalization. I will first discuss what happens when neural networks become infinitely wide, giving a general result for the transition to linearity (i.e., showing that neural networks become linear functions of parameters) for a broad class of wide neural networks corresponding to directed graphs.<br><br>I will then proceed to the question of depth, showing equivalence between infinitely wide and deep fully connected networks trained with gradient descent and Nadaraya-Watson predictors based on certain singular kernels. Using this connection we show that for certain activation functions these wide and deep networks are (asymptotically) optimal for classification but, interestingly, never for regression.
Based on joint work with Chaoyue Liu, Adit Radhakrishnan, Caroline Uhler and Libin Zhu.
Statistics seminar
Tuesday October 11, 2022, 9:30AM, Jussieu en salle 15-16.201 et retransmission
Yifan Cui (Zhejiang University) Instrumental Variable Approaches To Individualized Treatment Regimes Under A Counterfactual World
There is fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a robust classification-based instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we establish the identification of both value functions for a given regime and optimal regimes with the aid of a binary instrumental variable when no unmeasured confounding fails to hold. We also construct novel multiply robust classification-based estimators. In addition, we propose to identify and estimate optimal treatment regimes among those who would comply with the assigned treatment under a monotonicity assumption. Furthermore, we consider the problem of individualized treatment regimes under the sign and partial identification. In the former case, i) we provide a necessary and sufficient identification condition of optimal treatment regimes with an instrumental variable; ii) we establish the somewhat surprising result that complier optimal regimes can be consistently estimated without directly collecting compliance information and therefore without the compiler average treatment effect itself being identified. In the latter case, we establish a formal link between individualized decision making under partial identification and classical decision theory under uncertainty through a unified lower bound perspective.
Statistics seminar
Tuesday September 27, 2022, 9:30AM, Jussieu en salle 15-16.201
Emilie Kaufmann (CNRS) Exploration non paramétrique dans les modèles de bandits
Dans un modèle de bandit, un agent sélectionne de manière séquentielle des “bras”, qui sont des lois de probabilité initialement inconnues de l’agent, dans le but de maximiser la somme des échantillons obtenus, qui sont vus comme des récompenses. Les algorithmes de bandits les plus populaires sont basés sur la construction d’intervalles de confiance ou l’échantillonnage d’une loi a posteriori, mais ne peuvent atteindre des performances optimales qu’en ayant des connaissances a priori sur la famille de distributions des bras. Dans cet exposé nous allons présenter des approches alternatives basées sur du ré-échantillonnage de l’historique de chaque bras. De tels algorithmes peuvent s’avérer plus robustes en deux sens. Nous verrons qu’ils peuvent être optimaux pour différentes classes de distributions, et être aisément adaptés à des situations où le critère de performance n’est pas lié à la récompense moyenne de l’agent, mais prend en compte une mesure de risque.
Statistics seminar
Tuesday May 31, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Elsa Cazelles (IRIT) A novel notion of barycenter for probability distributions based on optimal weak mass transport
We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass. We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter is illustrated on several examples.
Statistics seminar
Tuesday May 10, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Guillaume Lecué (CREST) A geometrical viewpoint on the benign overfitting property of the minimum $\ell_2$-norm interpolant estimator.
Practitioners have observed that some deep learning models generalize well even with a perfect fit to noisy training data [1,2]. Since then many theoretical works have revealed some facets of this phenomenon [3,4,5] known as benign overfitting. In particular, in the linear regression model, the minimum l_2-norm interpolant estimator \hat\bbeta has received a lot of attention [3,4,6] since it was proved to be consistent even though it perfectly fits noisy data under some condition on the covariance matrix \Sigma of the input vector. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [6]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [4]: \hat\bbeta can be written as a sum of a ridge estimator \hat\bbeta_{1:k} and an overfitting component \hat\bbeta_{k+1:p} which follows a decomposition of the features space \bR^p=V_{1:k}\oplus^\perp V_{k+1:p} into the space V_{1:k} spanned by the top k eigenvectors of \Sigma and the one V_{k+1:p} spanned by the p-k last ones. We also prove a matching lower bound for the expected prediction risk. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint coincides with the effective rank from [3,6] and is the key tool to handle the behavior of the design matrix restricted to the sub-space V_{k+1:p} where overfitting happens. (Joint work with Zong Shang).
[1] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116(32):15849–15854, 2019.
[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64(3):107–115, 2021.
[3] Peter L. Bartlett, Philip M. Long, Gabor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117(48):30063–30070, 2020.
[4] Peter L. Bartlett, Andreas Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. To appear in Acta Numerica, 2021.
[5] Mikhail Belkin. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. To appear in Acta Numerica, 2021.
[6] Alexander Tsigler and Peter L. Bartlett. Benign overfitting in ridge regression. 2021.
Statistics seminar
Tuesday April 19, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Clément Marteau (Université Lyon 1) Supermix : régularisation parcimonieuse pour des modèles de mélange
Cet exposé s'intéresse à l'estimation d'une mesure de probabilité discrète $\mu_0$ impliquée dans un modèle de mélange. Utilisant des résultats récents en régularisation l1 sur l'espace des mesures, nous considérerons un problème d'optimisation convexe pour l'estimation de $\mu_0$ sans faire appel à l'utilisation d'une grille. Le traitement de ce problème d'optimisation nécessite l'introduction d'un certificat dual. Nous discuterons ensuite les propriétés statistiques de l'estimateur obtenu en s'intéressant en particulier au cas gaussien.
Statistics seminar
Tuesday April 5, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Fabrice Grela (Université de Nantes) Minimax detection and localisation of an abrupt change in a Poisson process
Considering a Poisson process observed on a bounded, fixed interval, we are interested in the problem of detecting an abrupt change in its distribution, characterized by a jump in its intensity. Formulated as an off-line change-point problem, we address two questions : the one of detecting a change-point and the one of estimating the jump location of such change-point. This study aims at proposing a non-asymptotic minimax testing set-up, first to construct a minimax and adaptive detection procedure and then to give a minimax study of a multiple testing procedure designed for simultaneously detect and localise a change-point.
Statistics seminar
Tuesday March 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Aymeric Dieuleveut (Polytechnique) Federated Learning and optimization: from a gentle introduction to recent results
In this presentation, I will present some results on optimization in the context of federated learning. I will summarise the main challenges and the type of results people have been interested in, and dive into some more recent results on tradeoffs between (bidirectional) compression, communication, privacy and user-heterogeneity. The presentation will be based on recent work with Constantin Philippenko, Maxence Noble, Aurélien Bellet.
Refs:Mainly:
Differentially Private Federated Learning on Heterogeneous Data, M Noble, A Bellet, A Dieuleveut, Aistats 2022, Link
Preserved central model for faster bidirectional compression in distributed settings C Philippenko, A Dieuleveut, Neurips 2021 LinkIf time allows it (unlikely):
Federated Expectation Maximization with heterogeneity mitigation and variance reduction, A Dieuleveut, G Fort, E Moulines, G Robin, Neurips 2021 Link
Statistics seminar
Tuesday March 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Lihua Lei (Stanford University) Testing for outliers with conformal p-values
We study the construction of p-values for nonparametric
outlier detection, taking a multiple-testing perspective. The goal
is to test whether new independent samples belong to the same
distribution as a reference data set or are outliers. We propose a
solution based on conformal inference, a broadly applicable
framework that yields p-values that are marginally valid but
mutually dependent for different test points. We prove these
p-values are positively dependent and enable exact false discovery
rate control, although in a relatively weak marginal sense. We
then introduce a new method to compute p-values that are both
valid conditionally on the training data and independent of each
other for different test points; this paves the way to stronger
type-I error guarantees. Our results depart from classical
conformal inference as we leverage concentration inequalities
rather than combinatorial arguments to establish our finite-sample
guarantees. Furthermore, our techniques also yield a uniform
confidence bound for the false positive rate of any outlier
detection algorithm, as a function of the threshold applied to its
raw statistics. Finally, the relevance of our results is
demonstrated by numerical experiments on real and simulated data.
Statistics seminar
Tuesday February 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Élisabeth Gassiat Deconvolution with unknown noise distribution
I consider the deconvolution problem in the case where no information is known about the noise distribution. More precisely, no assumption is made on the noise distribution and no samples are available to estimate it: the deconvolution problem is solved based only on observations of the corrupted signal. I will prove the identifiability of the model up to translation when the signal has a Laplace transform with an exponential growth $\rho$ smaller than 2 and when it can be decomposed into two dependent components, so that the identifiability theorem can be used for sequences of dependent data or for sequences of iid multidimensional data. In the case of iid multidimensional data, I will propose an adaptive estimator of the density of the signal and provide rates of convergence. This rate of convergence is known to be minimax when ρ = 1.
Statistics seminar
Tuesday January 25, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Nicolas Verzelen (Université de Montpellier) Optimal ranking in crowd-sourcing problem
Consider a crowd sourcing problem where we have n experts and d tasks. The average ability of each expert for each task is stored in an unknown matrix M, from which we have incomplete and noise observations. We make no (semi) parametric assumptions, but assume that both experts and tasks can be perfectly ordered: so that if an expert A is better than an expert B, the ability of A is higher than that of B for all tasks - and that the same holds for the tasks. This implies that if the matrix M, up to permutations of its rows and columns, is bi-isotonic.
We focus on the problem of recovering the optimal ranking of the experts in l2 norm, when the ordering of the tasks is known to the statistician. In other words, we aim at estimating the suitable permutation of the rows of M while the permutation of the columns is known. We provide a minimax-optimal and computationally feasible method for this problem, based on hierarchical clustering, PCA, change-point detection, and exchange of informations among the clusters. We prove in particular - in the case where d > n - that the problem of estimating the expert ranking is significantly easier than the problem of estimating the matrix M.
This talk is based on a joint ongoing work with Alexandra Carpentier and Emmanuel Pilliat.