Stochastic Models for the Inference of Life Evolution


SMILE is an interdisciplinary research group gathering probabilists, statisticians, bio-informaticians and biologists.
SMILE is affiliated to the Stochastics and Biology group of LPSM (Lab of Probability, Statistics and Modeling) at Sorbonne Université (ex Université Pierre et Marie Curie Paris 06).
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.

Recent contributions of the SMILE group related to SARS-Cov2 and COVID-19.


SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.


You can reach us by email (amaury.lambert - at - or (smile - at -

Light on



Coagulation-transport equations and the nested coalescents

The nested Kingman coalescent describes the dynamics of particles (called genes) contained in larger components (called species), where pairs of species coalesce at constant rate and pairs of genes coalesce at constant rate provided they lie within the same species. We prove that starting from \$$rn\$$ species, the empirical distribution of species masses (numbers of genes\$$/n\$$) at time \$$t/n\$$ converges as \$$n\to\infty\$$ to a solution of the deterministic coagulation-transport equation $$ \partial_t d \ = \ \partial_x ( \psi d ) \ + \ a(t)\left(d\star d - d \right), $$ where \$$\psi(x) = cx^2\$$, \$$\star\$$ denotes convolution and \$$a(t)= 1/(t+\delta)\$$ with \$$\delta=2/r\$$. The most interesting case when \$$\delta =0\$$ corresponds to an infinite initial number of species. This equation describes the evolution of the distribution of species of mass \$$x\$$, where pairs of species can coalesce and each species' mass evolves like \$$\dot x = -\psi(x)\$$. We provide two natural probabilistic solutions of the latter IPDE and address in detail the case when \$$\delta=0\$$. The first solution is expressed in terms of a branching particle system where particles carry masses behaving as independent continuous-state branching processes. The second one is the law of the solution to the following McKean-Vlasov equation $$ dx_t \ = \ - \psi(x_t) \,dt \ + \ v_t\,\Delta J_t $$ where \$$J\$$ is an inhomogeneous Poisson process with rate \$$1/(t+\delta)\$$ and \$$(v_t; t\geq0)\$$ is a sequence of independent rvs such that \$$\mathcal L(v_t) = \mathcal L(x_t)\$$. We show that there is a unique solution to this equation and we construct this solution with the help of a marked Brownian coalescent point process. When \$$\psi(x)=x^\gamma\$$, we show the existence of a self-similar solution for the PDE which relates when \$$\gamma=2\$$ to the speed of coming down from infinity of the nested Kingman coalescent.



The genomic view of diversification

Evolutionary relationships between species are traditionally represented in the form of a tree, called the species tree. The reconstruction of the species tree from molecular data is hindered by frequent conflicts between gene genealogies. A standard way of dealing with this issue is to postulate the existence of a unique species tree where disagreements between gene trees are explained by incomplete lineage sorting (ILS) due to random coalescences of gene lineages inside the edges of the species tree. This paradigm, known as the multi-species coalescent (MSC), is constantly violated by the ubiquitous presence of gene flow revealed by empirical studies, leading to topological incongruences of gene trees that cannot be explained by ILS alone. Here we argue that this paradigm should be revised in favor of a vision acknowledging the importance of gene flow and where gene histories shape the species tree rather than the opposite. We propose a new, plastic framework for modeling the joint evolution of gene and species lineages relaxing the hierarchy between the species tree and gene trees. As an illustration, we implement this framework in a mathematical model called the genomic diversification (GD) model based on coalescent theory, with four parameters tuning replication, genetic differentiation, gene flow and reproductive isolation. We use it to evaluate the amount of gene flow in two empirical data-sets. We find that in these data-sets, gene tree distributions are better explained by the best fitting GD model than by the best fitting MSC model. This work should pave the way for approaches of diversification using the richer signal contained in genomic evolutionary histories rather than in the mere species tree.

Upcoming seminars


Planning des salles du Collège de France.
Intranet du Collège de France.