Université Paris 6
Pierre et Marie Curie | Université Paris 7
Denis Diderot | |

CNRS U.M.R. 7599
| ||

``Probabilités et Modèles Aléatoires''
| ||

**Auteur(s): **

**Code(s) de Classification MSC:**

- 62H30 Classification and discrimination; cluster analysis [See also 68T10]
- 94A17 Measures of information, entropy
- 68Q32 Computational learning theory [See also 68T05]

**Résumé:** The common method to understand and improve classification rules is to prove
bounds on the generalization error.
Here we provide localized data-based PAC-bounds for the difference
between the risk of any two randomized estimators.
We derive from these bounds two types of algorithms: the first one
uses combinatorial technics and is related to compression schemes
whereas the second one involves Gibbs estimators.
We also recover some of the results of the Vapnik-Chervonenkis
theory and improve them by taking into account the variance term measured by
the
pseudo-distance $(f_1,f_2) \mapsto \dsP[f_1(X) \neq f_2(X)]$.
Finally, we present different ways of localizing the results in order to
improve the bounds and make them less dependent on the choice of the prior.
For some classes of functions (such as VC-classes), this will lead to gain a
$\log N$ factor
(without using the chaining technique (see [1] for more details)).

**Mots Clés:** *Statistical learning theory ; compression schemes ; Gibbs classifiers ;
error bounds ; adaptive estimator ; oracle inequalities ; VC-theory*

**Date:** 2004-04-09

**Prépublication numéro:** *PMA-905*

**Revised version : **PMA-905Bis.pdf (April 2004, renamed "A better variance control for PAC-Bayesian classification")