Université Paris 6
Pierre et Marie Curie
Université Paris 7
Denis Diderot

CNRS U.M.R. 7599
``Probabilités et Modèles Aléatoires''

Aggregated estimators and empirical complexity for least square regression


Code(s) de Classification MSC:

Résumé: Numerous empirical results have shown that combining regression procedures can be a very efficient method. This work provides PAC bounds for the $L^2$ generalization error of such methods. The interest of these bounds are twofold. First, it gives for any aggregating procedure a bound for the expected risk depending on the empirical risk and the empirical complexity measured by the Kullback-Leibler divergence between the aggregating distribution $\rhoz$ and a prior distribution $\pi$ and by the empirical mean of the variance of the regression functions under the probability $\rhoz$. Secondly, by structural risk minimization, we derive an aggregating procedure which takes advantage of the unknown properties of the best mixture $\tildf$: when the best convex combination $\tildf$ of $d$ regression functions belongs to the $d$ initial functions (i.e. when combining does not make the bias decrease), the convergence rate is of order $(\log d) / N$.%\frac{\log d}{N}$. In the worst case, our combining procedure achieves a convergence rate of order $\sqrt{(\log d) / N}$ %$\sqrt{\frac{\log d}{N}}$ which is known to be optimal in a uniform sense when $d > \sqrt{N}$ (see [9,13]). In support vector machines, we have to solve a $N$-dimensional linearly constrained quadratic problem. In our regression procedure, we have a $N$-dimensional unconstrained minimization problem. As in AdaBoost, the mixture posterior distribution tends to favor functions which disagree with the mixture on mispredicted points. Our algorithm is tested on artificial classification data (which have been also used for testing other boosting methods, such as AdaBoost).

Mots Clés: Nonparametric regression ; deviation inequalities ; adaptive estimator ; oracle inequalities ; Boosting

Date: 2003-03-11

Prépublication numéro: PMA-805

Postscript file : PMA-805.ps

Revised version : PMA-805Bis.pdf (April 2004)