Day, hour and place

Tuesday at 09:30, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201


Year 2022

Statistics seminar
Tuesday May 31, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Elsa Cazelles (IRIT) A novel notion of barycenter for probability distributions based on optimal weak mass transport

We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass. We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter is illustrated on several examples.

Statistics seminar
Tuesday May 10, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Guillaume Lecué (CREST) A geometrical viewpoint on the benign overfitting property of the minimum $\ell_2$-norm interpolant estimator.

Practitioners have observed that some deep learning models generalize well even with a perfect fit to noisy training data [1,2]. Since then many theoretical works have revealed some facets of this phenomenon [3,4,5] known as benign overfitting. In particular, in the linear regression model, the minimum l_2-norm interpolant estimator \hat\bbeta has received a lot of attention [3,4,6] since it was proved to be consistent even though it perfectly fits noisy data under some condition on the covariance matrix \Sigma of the input vector. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [6]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [4]: \hat\bbeta can be written as a sum of a ridge estimator \hat\bbeta_{1:k} and an overfitting component \hat\bbeta_{k+1:p} which follows a decomposition of the features space \bR^p=V_{1:k}\oplus^\perp V_{k+1:p} into the space V_{1:k} spanned by the top k eigenvectors of \Sigma and the one V_{k+1:p} spanned by the p-k last ones. We also prove a matching lower bound for the expected prediction risk. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint coincides with the effective rank from [3,6] and is the key tool to handle the behavior of the design matrix restricted to the sub-space V_{k+1:p} where overfitting happens. (Joint work with Zong Shang).

[1] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116(32):15849–15854, 2019.

[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64(3):107–115, 2021.

[3] Peter L. Bartlett, Philip M. Long, Gabor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117(48):30063–30070, 2020.

[4] Peter L. Bartlett, Andreas Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. To appear in Acta Numerica, 2021.

[5] Mikhail Belkin. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. To appear in Acta Numerica, 2021.

[6] Alexander Tsigler and Peter L. Bartlett. Benign overfitting in ridge regression. 2021.

Statistics seminar
Tuesday April 19, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Clément Marteau (Université Lyon 1) Supermix : régularisation parcimonieuse pour des modèles de mélange

Cet exposé s'intéresse à l'estimation d'une mesure de probabilité discrète $\mu_0$ impliquée dans un modèle de mélange. Utilisant des résultats récents en régularisation l1 sur l'espace des mesures, nous considérerons un problème d'optimisation convexe pour l'estimation de $\mu_0$ sans faire appel à l'utilisation d'une grille. Le traitement de ce problème d'optimisation nécessite l'introduction d'un certificat dual. Nous discuterons ensuite les propriétés statistiques de l'estimateur obtenu en s'intéressant en particulier au cas gaussien.

Statistics seminar
Tuesday April 5, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Fabrice Grela (Université de Nantes) Minimax detection and localisation of an abrupt change in a Poisson process

Considering a Poisson process observed on a bounded, fixed interval, we are interested in the problem of detecting an abrupt change in its distribution, characterized by a jump in its intensity. Formulated as an off-line change-point problem, we address two questions : the one of detecting a change-point and the one of estimating the jump location of such change-point. This study aims at proposing a non-asymptotic minimax testing set-up, first to construct a minimax and adaptive detection procedure and then to give a minimax study of a multiple testing procedure designed for simultaneously detect and localise a change-point.

Statistics seminar
Tuesday March 22, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Aymeric Dieuleveut (Polytechnique) Federated Learning and optimization: from a gentle introduction to recent results

In this presentation, I will present some results on optimization in the context of federated learning. I will summarise the main challenges and the type of results people have been interested in, and dive into some more recent results on tradeoffs between (bidirectional) compression, communication, privacy and user-heterogeneity. The presentation will be based on recent work with Constantin Philippenko, Maxence Noble, Aurélien Bellet.

Refs:Mainly: Differentially Private Federated Learning on Heterogeneous Data, M Noble, A Bellet, A Dieuleveut, Aistats 2022, Link Preserved central model for faster bidirectional compression in distributed settings C Philippenko, A Dieuleveut, Neurips 2021 LinkIf time allows it (unlikely): Federated Expectation Maximization with heterogeneity mitigation and variance reduction, A Dieuleveut, G Fort, E Moulines, G Robin, Neurips 2021 Link

Statistics seminar
Tuesday March 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Lihua Lei (Stanford University) Testing for outliers with conformal p-values

We study the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework that yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

Statistics seminar
Tuesday February 8, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Élisabeth Gassiat Deconvolution with unknown noise distribution

I consider the deconvolution problem in the case where no information is known about the noise distribution. More precisely, no assumption is made on the noise distribution and no samples are available to estimate it: the deconvolution problem is solved based only on observations of the corrupted signal. I will prove the identifiability of the model up to translation when the signal has a Laplace transform with an exponential growth $\rho$ smaller than 2 and when it can be decomposed into two dependent components, so that the identifiability theorem can be used for sequences of dependent data or for sequences of iid multidimensional data. In the case of iid multidimensional data, I will propose an adaptive estimator of the density of the signal and provide rates of convergence. This rate of convergence is known to be minimax when ρ = 1.

Statistics seminar
Tuesday January 25, 2022, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Nicolas Verzelen (Université de Montpellier) Optimal ranking in crowd-sourcing problem

Consider a crowd sourcing problem where we have n experts and d tasks. The average ability of each expert for each task is stored in an unknown matrix M, from which we have incomplete and noise observations. We make no (semi) parametric assumptions, but assume that both experts and tasks can be perfectly ordered: so that if an expert A is better than an expert B, the ability of A is higher than that of B for all tasks - and that the same holds for the tasks. This implies that if the matrix M, up to permutations of its rows and columns, is bi-isotonic. We focus on the problem of recovering the optimal ranking of the experts in l2 norm, when the ordering of the tasks is known to the statistician. In other words, we aim at estimating the suitable permutation of the rows of M while the permutation of the columns is known. We provide a minimax-optimal and computationally feasible method for this problem, based on hierarchical clustering, PCA, change-point detection, and exchange of informations among the clusters. We prove in particular - in the case where d > n - that the problem of estimating the expert ranking is significantly easier than the problem of estimating the matrix M.

This talk is based on a joint ongoing work with Alexandra Carpentier and Emmanuel Pilliat.

Year 2021

Statistics seminar
Tuesday December 14, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Julie Delon (Université de Paris) Some perspectives on stochastic models for Bayesian image restoration

Random image models are central for solving inverse problems in imaging. In a Bayesian formalism, these models can be used as priors or regularisers and combined to an explicit likelihood function to define posterior distributions. Most of the time, these posterior distributions are used to derive Maximum A Posteriori (MAP) estimators, leading to optimization problems that may be convex or not, but are well studied and understood. Sampling schemes can also be used to explore these posterior distributions, to derive Minimum Mean Square Error (MMSE) estimators, quantify uncertainty or perform other advanced inferences. While research on inverse problems has focused for many years on explicit image models (either directly in the image space, or in a transformed space), an important trend nowadays is to use implicit image models encoded by neural networks. This opens the way to restoration algorithms that exploit more powerful and accurate prior models for natural images but raises novel challenges and questions on the corresponding posterior distributions and their resulting estimators. The goal of this presentation is to provide some perspectives and present recent developments on these questions.

Statistics seminar
Tuesday November 30, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Frédéric Chazal (INRIA) A framework to differentiate persistent homology with applications in Machine Learning and Statistics

Understanding the differentiable structure of persistent homology and solving optimization tasks based on functions and losses with a topological flavor is a very active, growing field of research in data science and Topological Data Analysis, with applications in non-convex optimization, statistics and machine learning.

However, the approaches proposed in the literature are usually

anchored to a specific application and/or topological construction, and do not come with theoretical guarantees.

In this talk, we will study the differentiability of a general map associated with the most common topological construction, that is, the persistence map. Building on real analytic geometry arguments, we propose a general framework that allows to define and compute gradients for persistence-based functions in a very simple way. As an application, we also provide a simple, explicit and sufficient condition for convergence of stochastic subgradient methods for such functions. If time permits, as another application, we will also show how this framework combined with standard geometric measure theory arguments leads to results on the statistical behavior of persistence diagrams of filtrations built on top of random point clouds.

Statistics seminar
Tuesday November 23, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Yannick Baraud (Université de Luxembourg) Comment construire des lois a posteriori robustes à partir de tests ?

Les estimateurs bayésiens classiques, tout comme ceux bâtis à partir de la vraisemblance, ont de bonnes qualités d’estimation lorsque le modèle statistique est exact et rend parfaitement compte de la loi des données. Dans le cas contraire, lorsque le modèle n’est qu’approximatif, ces estimateurs peuvent devenir terriblement mauvais et il suffit parfois d’une seule donnée aberrante au regard du modèle utilisé pour que cela advienne. Nous montrerons comment remédier à ce problème d’instabilité en proposant dans le cadre bayésien une nouvelle loi a posteriori construite à partir de tests robustes convenables. Nous verrons comment cette approche fournit des estimateurs à la fois optimaux lorsque le modèle est exact et stables à une légère erreur de modélisation.

Statistics seminar
Tuesday November 9, 2021, 9:30AM, Sophie Germain en salle 1013 / Jussieu en salle 15-16.201
Alessandro Rudi (INRIA) PSD models for Non-convex optimization and beyond

In this talk we present a rather flexible and expressive model for non-negative functions. We will show direct applications in probability representation and non-convex optimization. In particular, the model allows to derive an algorithm for non-convex optimization that is adaptive to the degree of differentiability of the objective function and achieves optimal rates of convergence. Finally we show how to apply the same technique to other interesting problems in applied mathematics that can be easily expressed in terms of inequalities.

Statistics seminar
Tuesday October 19, 2021, 9:30AM, Sophie Germain en salle 1013
Antoine Marchina (Université de Paris) Concentration inequalities for suprema of unbounded empirical processes

In this talk, we will provide new concentration inequalities for suprema of (possibly) non-centered and unbounded empirical processes associated with independent and identically distributed random variables. In particular, we establish Fuk-Nagaev type inequalities with the optimal constant in the moderate deviation bandwidth. We will also explain the use of these results in statistical applications (ongoing research)

Statistics seminar
Tuesday October 5, 2021, 9:30AM, Jussieu en salle 15-16.201
Judith Rousseau (Oxford) Semiparametric and nonparametric Bayesian inference in hidden Markov models

In this work we are interested in inference in Hidden Markov models with infinite state space and nonparametric emission distributions. Since the seminal paper of Gassiat et al. (2016), it is known that in such models the transition matrix Q and the emission distributions F1; … ; FK are identi fiable, up to label switching. We propose an (almost) Bayesian method to simultaneously estimate Q at the rate sqrt(n) and the emission distributions at the usual nonparametric rates. To do so, we first consider a prior pi1 on Q and F1; … ; Fk which leads to a posterior marginal distribution on Q which veri fies the Bernstein von mises property and thus to an estimator of Q which is efficient. We then combine the marginal posterior on Q with an other posterior distribution on the emission distributions, following the cut-posterior approach, to obtain a posterior which also concentrates around the emission distributions at the minimax rates. In addition an important intermediate result of our work is an inversion inequality which allows to upper bound the L1 norms between the emission densities by the L1 norms between marginal densities of 3 consecutive observations.

Joint work with D. Moss (Oxford).