Laura Kubatko | Professor of Statistics and EEOB

Software available for download

SVDquartets: Singular Value Decomposition Scores for Species Quartets: This program will compute scores based on the singular value decomposition of a flattenning matrix of site pattern probabilities for quartets of species. The score can effectively differentiate which of the three possible quartet trees is the true tree based on SNP data. The method is described in the following papers:

Chifman, J. and L. Kubatko. 2014. Quartet inference from SNP data under the coalescent, Bioinformatics 30(23): 3317-3324.

Chifman, J. and L. Kubatko. 2015. Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, Journal of Theoretical Biology 374: 35-47.

Swofford, D. L. and L. S. Kubatko. 2023. Species tree estimation using site pattern frequencies, in Species Tree Inference: A Guide to Methods and Applications, Chapter 4, pgs. 68-88, Princeton University Press [web link].

Click here to download the software.

qAge: Estimation of speciation times under the multispecies coalescent: This method provides estimates of the node times (or equivalently, the branch lengths) in a species-level phylogeny under the multispecies coalescent. The method is described in the following papers:

Peng, J., D. Swofford, and L. Kubatko. 2022. Estimation of speciation times under the multispecies coalescent, Bioinformatics 38(23): 5182-5190 [web link].

Swofford, D. L. and L. S. Kubatko. 2023. Species tree estimation using site pattern frequencies, in Species Tree Inference: A Guide to Methods and Applications, Chapter 4, pgs. 68-88, Princeton University Press [web link].

The method is included in PAUP*, available here.

HyDe: Hybridization Detection: This program evaluates the hypothesis of hybrid speciation for sets of three taxa and can be used to search for hybrid species in large data sets. The method is described in the following papers:

Kubatko, L. and J. Chifman. 2019. An invariants-based method for efficient identification of hybrid speciation from large-scale genomic data, BMC Evolutionary Biology 19:112 [web link].

Blischak, P., J. Chifman, A. D. Wolfe, and L. S. Kubatko. 2018. HyDe: a Python package for genome-scale hybridization detection, Systematic Biology 67(5): 821-829 [web link].

Click here to download the program.

rapidphylo: Rapidly Estimate Phylogeny from Large Allele Frequency Data Using Root Distances Method: This R packages rapidly estimates tree-topology from large allele frequency data using Root Distances Method, under a Brownian Motion Model. The method is described in the following paper:

Peng, J., H. Rajeevan, L. Kubatko, and A. RoyChoudhury. 2021. A fast likelihood approach for estimation of large phylogenies from allele frequency data, Molecular Phylogenetics and Evolution 161: 107142 [web link].

Click here to link to the package on CRAN.

COALGF Calculator: COALGF Calculator is a C program that will compute the probability distribution of gene tree histories and gene tree topolgies for a fixed three-taxon species tree under the coalescent model with gene flow between both pairs of sister population. This program was used for all of the calculations in the paper:

Tian, Y. and L. Kubatko. 2016. Distribution of gene tree histories under the coalescent model with gene flow, Molecular Phylogenetics and Evolution 105: 177-192 [web link]; preliminary version available on bioRxiv.

Click here to download the program.

SSA: Inference of Maximum Likelihood Phylogenetic Trees Using a Stochastic Search Algorithm: This program implements the stochastic search algorithm for estimation of phylogenetic trees under the maximum likelihood criterion. It is described in the following paper for trees which satisfy the molecular clock, although it can estimate non-clocklike trees as well:

Salter, L. and D. Pearl. 2001. A Stochastic Search Strategy for Estimation of Maximum Likelihood Phylogenetic Trees, Systematic Biology 50(1): 7-17.

Click here to download the program.

Phylogenetic Utility Programs: This includes programs to aid in the summary of the posterior distribution of phylogenetic trees from the program MrBayes. The first program, post_prob, will tabulate the percentage of times each tree topology appears in the posterior distribution output by MrBayes, thus providing posterior probabilities for each tree topology. The second program, post_root, will tabulate the percentage of times each possible root position appears in the posterior distribution output by MrBayes, thus providing posterior probabities for each root.

Click here to download these programs.

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization): STEM-hy is a program for inferring maximum likelihood species trees from a collection of estimated gene trees under the coalescent model. It also carries out bootstrap analyses and can evaluate hybridization hypotheses in a model selection framework.

Click here to download the program.
Click here for slides from a tutorial on STEM and STEM-hy from University of Georgia, April 2012.

HybTree: A Perl Script for Estimating Hybridization and Time Scales In the Presence of Deep Coalescence: Written by David Gerard for the analyses in the following paper:

Gerard, D., H. L. Gibbs, and L. Kubatko. 2011. Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling. BMC Evolutionary Biology,11: 291 [web link]

Click here to download the scripts.

COAL: Program for computing gene tree probabilities under the coalescent process: Written by James Degnan (Ph.D., Spring 2005) - please see the COAL Website for more information and to download the program.

FPRNET: Program for detecting gene regulatory networks from gene expression data using fuzzy logic, probability, and regression methods: Written by Guy Brock (Ph.D., Summer 2003) - please see the FPRNET website for more information and to download the program.

Analysis of High-throughput Proteomic Data: Programs to perform the analysis in the following reference are available:

Gilchrist, M., L. Salter, and A. Wagner. 2004. A statistical framework for interpreting high-throughput proteomic datasets, Bioinformatics 20(5): 689-700.

Click here to download these programs.