Stochastic Models for the Inference of Life Evolution


SMILE is an interdisciplinary research group gathering probabilists, statisticians, bio-informaticians and biologists.
SMILE is affiliated to the Stochastics and Biology group of LPSM (Lab of Probability, Statistics and Modeling) at Sorbonne Université (ex Université Pierre et Marie Curie Paris 06).
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.


SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.


You can reach us by email (amaury.lambert - at - or (smile - at -

Light on



Coagulation-transport equations and the nested coalescents

The nested Kingman coalescent describes the dynamics of particles (called genes) contained in larger components (called species), where pairs of species coalesce at constant rate and pairs of genes coalesce at constant rate provided they lie within the same species. We prove that starting from \$$rn\$$ species, the empirical distribution of species masses (numbers of genes\$$/n\$$) at time \$$t/n\$$ converges as \$$n\to\infty\$$ to a solution of the deterministic coagulation-transport equation $$ \partial_t d \ = \ \partial_x ( \psi d ) \ + \ a(t)\left(d\star d - d \right), $$ where \$$\psi(x) = cx^2\$$, \$$\star\$$ denotes convolution and \$$a(t)= 1/(t+\delta)\$$ with \$$\delta=2/r\$$. The most interesting case when \$$\delta =0\$$ corresponds to an infinite initial number of species. This equation describes the evolution of the distribution of species of mass \$$x\$$, where pairs of species can coalesce and each species' mass evolves like \$$\dot x = -\psi(x)\$$. We provide two natural probabilistic solutions of the latter IPDE and address in detail the case when \$$\delta=0\$$. The first solution is expressed in terms of a branching particle system where particles carry masses behaving as independent continuous-state branching processes. The second one is the law of the solution to the following McKean-Vlasov equation $$ dx_t \ = \ - \psi(x_t) \,dt \ + \ v_t\,\Delta J_t $$ where \$$J\$$ is an inhomogeneous Poisson process with rate \$$1/(t+\delta)\$$ and \$$(v_t; t\geq0)\$$ is a sequence of independent rvs such that \$$\mathcal L(v_t) = \mathcal L(x_t)\$$. We show that there is a unique solution to this equation and we construct this solution with the help of a marked Brownian coalescent point process. When \$$\psi(x)=x^\gamma\$$, we show the existence of a self-similar solution for the PDE which relates when \$$\gamma=2\$$ to the speed of coming down from infinity of the nested Kingman coalescent.



The impact of selection, gene conversion, and biased sampling on the assessment of microbial demography

Recent studies have linked demographic changes and epidemiological patterns in bacterial populations using coalescent-based approaches. We identified 26 studies using skyline plots and found that 21 inferred overall population expansion. This surprising result led us to analyze the impact of natural selection, recombination (gene conversion), and sampling biases on demographic inference using skyline plots and site frequency spectra (SFS). Forward simulations based on biologically relevant parameters from Escherichia coli populations showed that theoretical arguments on the detrimental impact of recombination and especially natural selection on the reconstructed genealogies cannot be ignored in practice. In fact, both processes systematically lead to spurious interpretations of population expansion in skyline plots (and in SFS for selection). Weak purifying selection, and especially positive selection, had important effects on skyline plots, showing patterns akin to those of population expansions. State-of-the-art techniques to remove recombination further amplified these biases. We simulated three common sampling biases in microbiological research: uniform, clustered, and mixed sampling. Alone, or together with recombination and selection, they further mislead demographic inferences producing almost any possible skyline shape or SFS. Interestingly, sampling sub-populations also affected skyline plots and SFS, because the coalescent rates of populations and their sub-populations had different distributions. This study suggests that extreme caution is needed to infer demographic changes solely based on reconstructed genealogies. We suggest that the development of novel sampling strategies and the joint analyzes of diverse population genetic methods are strictly necessary to estimate demographic changes in populations where selection, recombination, and biased sampling are present.



How Ecology and Landscape Dynamics Shape Phylogenetic Trees

Whether biotic or abiotic factors are the dominant drivers of clade diversification is a long-standing question in evolutionary biology. The ubiquitous patterns of phylogenetic imbalance and branching slowdown have been taken as supporting the role of ecological niche filling and spatial heterogeneity in ecological features, and thus of biotic processes, in diversification. However, a proper theoretical assessment of the relative roles of biotic and abiotic factors in macroevolution requires models that integrate both types of factors, and such models have been lacking. In this study, we use an individual-based model to investigate the temporal patterns of diversification driven by ecological speciation in a stochastically fluctuating geographic landscape. The model generates phylogenies whose shape evolves as the clade ages. Stabilization of tree shape often occurs after ecological saturation, revealing species turnover caused by competition and demographic stochasticity. In the initial phase of diversification (allopatric radiation into an empty landscape), trees tend to be unbalanced and branching slows down. As diversification proceeds further due to landscape dynamics, balance and branching tempo may increase and become positive. Three main conclusions follow. First, the phylogenies of ecologically saturated clades do not always exhibit branching slowdown. Branching slowdown requires that competition be wide or heterogeneous across the landscape, or that the characteristics of landscape dynamics vary geographically. Conversely, branching acceleration is predicted under narrow competition or frequent local catastrophes. Second, ecological heterogeneity does not necessarily cause phylogenies to be unbalanced--short time in geographical isolation or frequent local catastrophes may lead to balanced trees despite spatial heterogeneity. Conversely, unbalanced trees can emerge without spatial heterogeneity, notably if competition is wide. Third, short isolation time causes a radically different and quite robust pattern of phylogenies that are balanced and yet exhibit branching slowdown. In conclusion, biotic factors have a strong and diverse influence on the shape of phylogenies of ecologically saturating clades and create the evolutionary template in which branching slowdown and tree imbalance may occur. However, the contingency of landscape dynamics and resource distribution can cause wide variation in branching tempo and tree balance. Finally, considerable variation in tree shape among simulation replicates calls for caution when interpreting variation in the shape of real phylogenies.

Upcoming seminars


A brief checkup on the genome scan methods to detect local adaptation


December 17, 2019 at 10 - Collège de France


Planning des salles du Collège de France.
Intranet du Collège de France.