Université Paris 6Pierre et Marie Curie Université Paris 7Denis Diderot CNRS U.M.R. 7599 Probabilités et Modèles Aléatoires''

Model selection for Gaussian regression with random design

Auteur(s):

Code(s) de Classification MSC:

• 62J02 General nonlinear regression
• 62G07 Curve estimation (nonparametric regression, density estimation, etc.)

Résumé: This paper is about Gaussian regression with random design, where the observations are i.i.d., it is known from Le Cam (1973, 1975 and 1986) that the rate of convergence of optimal estimators is closely connected to the metric structure of the parameter space with respect to the Hellinger distance. In particular, this metric structure essentially determines the risk when the loss function is a power of the Hellinger distance. For random design regression, one typically uses as loss function the squared $\Bbb{L}_2$-distance between the estimator and the parameter. If the parameter space is bounded with respect to the $\Bbb{L}_\infty$-norm, both distances are equivalent. Without this assumption, it may happen that there is a large distorsion between the two distances, resulting in some unusual rates of convergence for the squared $\Bbb{L}_2$-risk, as noticed by Baraud (2002). We shall first explain this phenomenon and then show that the use of the Hellinger distance instead of the $\Bbb{L}_2$-distance allows to recover the usual rates and to perform model selection in great generality. An extension to the $\Bbb{L}_2$-risk is given under a boundedness assumption similar to the one in Wegkamp (2003).

Mots Clés: Random design regression ; model selection ; Hellinger distance ; minimax risk ; Besov spaces

Date: 2002-12-19

Prépublication numéro: PMA-783

Pdf file : PMA-783.pdf