SMILE

Stochastic Models for the Inference of Life Evolution

Presentation

SMILE is an interdisciplinary research group gathering probabilists, statisticians, bio-informaticians and biologists.
SMILE is affiliated to the Stochastics and Biology group of LPSM (Lab of Probability, Statistics and Modeling) at Sorbonne Université (ex Université Pierre et Marie Curie Paris 06).
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.

Recent contributions of the SMILE group related to SARS-Cov2 and COVID-19.

Directions

SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.

Contact

You can reach us by email (amaury.lambert - at - upmc.fr) or (smile - at - listes.upmc.fr).

Light on

Publication

2020

From individual-based epidemic models to McKendrick-von Foerster PDEs: A guide to modeling and inferring COVID-19 dynamics


We present a unifying, tractable approach for studying the spread of viruses causing complex diseases, requiring to be modeled with a large number of types (infective stage, clinical state, risk factor class...). We show that recording for each infected individual her infection age, i.e., the time elapsed since she was infected,
1. The age distribution \$$n(t,a)\$$ of the population at time \$$t\$$ is simply described by means of a first-order, one-dimensional partial differential equation (PDE) known as the McKendrick--von Foerster equation;
2. The frequency of type \$$i\$$ at time \$$t\$$ is simply obtained by integrating the probability \$$p(a,i)\$$ of being in state \$$i\$$ at age \$$a\$$ against the age distribution \$$n(t,a)\$$.
The advantage of this approach is three-fold. First, regardless of the number of types, macroscopic observables (e.g., incidence or prevalence of each type) only rely on a one-dimensional PDE ``decorated'' with types. This representation induces a simple methodology based on the McKendrick-von Foerster PDE with Poisson sampling to infer and forecast the epidemic. This technique is illustrated with French data of the COVID-19 epidemic.
Second, our approach generalizes and simplifies standard compartmental models using high-dimensional systems of ODEs to account for disease complexity. We show that such models can always be rewritten in our framework, thus providing a low-dimensional yet equivalent representation of these complex models.
Third, beyond the simplicity of the approach and its computational advantages, we show that our population model naturally appears as a universal scaling limit of a large class of fully stochastic individual-based epidemic models,
where the initial condition of the PDE emerges as the limiting age structure of an exponentially growing population starting from a single individual.

Publication

2018

Ranked Tree Shapes, Nonrandom Extinctions, and the Loss of Phylogenetic Diversity

Phylogenetic diversity (PD) is a measure of the evolutionary legacy of a group of species, which can be used to define conservation priorities. It has been shown that an important loss of species diversity can sometimes lead to a much less important loss of PD, depending on the topology of the species tree and on the distribution of its branch lengths. However, the rate of decrease of PD strongly depends on the relative depths of the nodes in the tree and on the order in which species become extinct. We introduce a new, sampling-consistent, three-parameter model generating random trees with covarying topology, clade relative depths and clade relative extinction risks. This model can be seen as an extension to Aldous' one parameter splitting model (\$$\beta\$$, which controls for tree balance) with two additional parameters: a new parameter \$$\alpha\$$ quantifying the correlation between the richness of a clade and its relative depth, and a parameter \$$\eta\$$ quantifying the correlation between the richness of a clade and its frequency (relative abundance or range), taken herein as a proxy for its overall extinction risk. We show on simulated phylogenies that loss of PD depends on the combined effect of all three parameters, \$$\beta\$$, \$$\alpha\$$ and \$$\eta\$$. In particular, PD may decrease as fast as species diversity when high extinction risks are clustered within small, old clades, corresponding to a parameter range that we term the `thin ice zone' (\$$\beta<-1\$$ or \$$\alpha<0\$$; \$$\eta>1\$$). Besides, when high extinction risks are clustered within large clades, the loss of PD can be higher in trees that are more balanced (\$$\beta>0\$$), in contrast to the predictions of earlier studies based on simpler models. We propose a Monte-Carlo algorithm, tested on simulated data, to infer all three parameters. Applying it to a real dataset comprising 120 bird clades (class Aves) with known range sizes , we show that parameter estimates precisely fall close to close to a 'thin ice zone': the combination of their ranking tree shape and non-random extinctions risks makes them prone to a sudden collapse of PD.

Publication

2018

Coagulation-transport equations and the nested coalescents

The nested Kingman coalescent describes the dynamics of particles (called genes) contained in larger components (called species), where pairs of species coalesce at constant rate and pairs of genes coalesce at constant rate provided they lie within the same species. We prove that starting from \$$rn\$$ species, the empirical distribution of species masses (numbers of genes\$$/n\$$) at time \$$t/n\$$ converges as \$$n\to\infty\$$ to a solution of the deterministic coagulation-transport equation $$ \partial_t d \ = \ \partial_x ( \psi d ) \ + \ a(t)\left(d\star d - d \right), $$ where \$$\psi(x) = cx^2\$$, \$$\star\$$ denotes convolution and \$$a(t)= 1/(t+\delta)\$$ with \$$\delta=2/r\$$. The most interesting case when \$$\delta =0\$$ corresponds to an infinite initial number of species. This equation describes the evolution of the distribution of species of mass \$$x\$$, where pairs of species can coalesce and each species' mass evolves like \$$\dot x = -\psi(x)\$$. We provide two natural probabilistic solutions of the latter IPDE and address in detail the case when \$$\delta=0\$$. The first solution is expressed in terms of a branching particle system where particles carry masses behaving as independent continuous-state branching processes. The second one is the law of the solution to the following McKean-Vlasov equation $$ dx_t \ = \ - \psi(x_t) \,dt \ + \ v_t\,\Delta J_t $$ where \$$J\$$ is an inhomogeneous Poisson process with rate \$$1/(t+\delta)\$$ and \$$(v_t; t\geq0)\$$ is a sequence of independent rvs such that \$$\mathcal L(v_t) = \mathcal L(x_t)\$$. We show that there is a unique solution to this equation and we construct this solution with the help of a marked Brownian coalescent point process. When \$$\psi(x)=x^\gamma\$$, we show the existence of a self-similar solution for the PDE which relates when \$$\gamma=2\$$ to the speed of coming down from infinity of the nested Kingman coalescent.

Publication

2015

Time Reversal Dualities for some Random Forests

We consider a random forest \$$\mathcal{F}^*\$$, defined as a sequence of i.i.d. birth-death (BD) trees, each started at time 0 from a single ancestor, stopped at the first tree having survived up to a fixed time \$$T\$$. We denote by \$$\left(\xi^*_t, 0\leq t \leq T \right)\$$ the population size process associated to this forest, and we prove that if the BD trees are supercritical, then the time-reversed process \$$\left(\xi^*_{T-t}, 0 \leq t \leq T\right)\$$, has the same distribution as \$$\left(\widetilde\xi^*_t, 0 \leq t \leq T\right)\$$, the corresponding population size process of an equally defined forest \$$\widetilde{\mathcal{F}}^*\$$, but where the underlying BD trees are subcritical, obtained by swapping birth and death rates or equivalently, conditioning on ultimate extinction. We generalize this result to splitting trees (i.e. life durations of individuals are not necessarily exponential), provided that the i.i.d. lifetimes of the ancestors have a specific explicit distribution, different from that of their descendants. The results are based on an identity between the contour of these random forests truncated up to \$$T\$$ and the duality property of L\'evy processes. This identity allows us to also derive other useful properties such as the distribution of the population size process conditional on the reconstructed tree of individuals alive at \$$T\$$, which has potential applications in epidemiology.

Upcoming seminars

Resources

Planning des salles du Collège de France.
Intranet du Collège de France.