Detecting historical introgression with simcat
Existing methods for detecting introgressive branches on phylogenies are limited by either having to fit overly parameterized network models, limiting the amount of data we can use, or by summarizing over rapid algebraic inferences from individual subtrees, throwing away signals that might exist among larger groupings.
I have developed a method, simcat, that extends algebraic methods like the D-statistic, instead using machine learning to detect introgression on phylogenies using genome-wide SNP data from an alignment of multiple species.
Genomic simulations with ipcoal
I wrote the software package ipcoal to facilitate simulations at the interface of population genetics and phylogenetics. ipcoal wraps around the population genetics coalescent simulator msprime. ipcoal accepts certain parameters that are typically associated with species networks, including a tree topology, branch-specific effective population sizes, a mutation model, and admixture edges. It also accepts parameters that are typically associated with population genetic models, like recombination maps and generation time variation.
ipcoal can generate sequence data in full-chromosome format or as unlinked SNPs, or it can simply generate genealogies without simulating the sequence data. It also has built in tools for gene tree inference and for writing out simulated data in formats that are useful for downstream inference.
More detail about ipcoal can be found in our Bioinformatics manuscript, and we have used it for analysis in our paper “The Multispecies Coalescent in Space and Time.” ipcoal is also the engine powering simulations for training database construction with simcat.