Current Research Projects

I. Species Tree Inference Using Algebraic Statistics

Former MBI post-doc Julia Chifman (currently a post-doc at Wake Forest University) and I have received NSF funding to study the problem of inferring species-level phylogenies under the coalescent model using techniques from algebraic statistics. Our work to date on this problem includes establishing identifiability of arbitrary n-leaf trees under the coalescent model for SNP data arising from the standard nucleotide substitution models (e.g., the General Time Reversible model with a proportion of invariable sites and rate variation across sites, and all sub-models). We have further generalized these results to establish identifiability for data with an arbitrary number of states, thus enabling application to presence/absence data and to codon and amino acid data.

The method used to establish identifiability is based on the application of techniques from analytic geometry to a special representation of the data in matrix form that encodes the probabilities of the patterns of sites in the DNA sequences under the combined mutation-coalescent model (see our arXiv paper for details). This representation of the data leads naturally to a method for inferring species-level phylogenies. The method works by decomposing the data into quartets of individuals for which data are available. Phylogenetic relationships among quartets are inferred using the singular values from this special matrix, and complete phylogenies are inferred by using quartet assembly methods with the estimated quartet relationships as input (link to paper). The method has been implemented in the popular phylogenetic software package PAUP*, which is freely available here.

Species phylogeny inferred using the SVDquartets method for six subspecies of North American rattlesnake. Individuals are colored according to their species-level memberships.

Our continuing work on this problem extends in three primary directions. First, we are exploring methods for establishing identifiability of parameters in addition to the phylogenetic tree topology, such as the timing of speciation events and the ancestral population sizes. Second, we are developing methods for inferring hybridization from large-scale genomic data sets, based on expected site pattern probabilities under these models (see Project II below). Finally, we plan to explore use of these methods for species delimitation, as this will have an impact on the African cassava project mentioned below (Project III).

II. Testing Adaptive Radiation Theory in Penstemon (Plantaginaceae)

EEOB Professor Andi Wolfe and I have recently received NSF funding to study the theory of adaptive radiation in the plant genus Penstemon. Penstemon is the largest plant genus native to North America, with 280 species (389 taxa) divided up into six subgenera, 12 sections, and 23 subsections. Most of the species are found in the mountains of the western states, with the center of diversity found in Utah, Colorado, California and Nevada. Most species are pollinated by bees, but there are also adaptations for wasp, butterfly, and hummingbird pollination.

My role on the project involves the development of statistical methodology to understand the process of diversification in Penstemon. One primary aim involves the development of techniques to detect hybridization, a process that is thought to be commonplace in Penstemon. Our approaches to this problem are two-fold. First, graduate student Yuan Tian, who will be supported as a GRA on the grant, is currently developing methods for studying the distribution of gene trees under both coalescence and gene flow. Second, our recently published work that uses methods for algebraic statistics to infer species trees can be readily extended to effectively test for hybridization.

Also associated with this project is EEOB graduate student Paul Blischak, who began working on Penstemon as an undergraduate in OSU's RUMBA program. Paul is carrying out a range of research projects, from fieldwork to lab work to methodological development for understanding the consequences of polyploidy on the evolution of this group.

Penstemon davidsonii; Mount Ashland, Oregon (photo credit: Andi Wolfe)

III. African Cassava Whitefly: Outbreak Causes and Sustainable Solutions

Recently, an international team consisting of fourteen main partnering organizations received funding from the Gates Foundation to study the whitefly (Bemisia tabaci), a collection of indistinguishable insect species that transmit viruses to plants such as cassava. Whiteflies cause destruction of cassava crops, resulting in famine across parts of Africa. Recent annual losses are estimated to more than $1.25M USD. The goal of the project is to find sustainable solutions to this problem. The project kicked off with a ceremony and group meeting in Uganda on March 2, 2015.

My role on this project involves working with former UNM student and long-time collaborator Dr. Laura Boykin, who heads the project at the University of Western Australia. Laura will be overseeing sequencing efforts to generate large-scale genomic data from whiteflies sampled throughout Africa, with the goal of understanding their diversity and division into distinct species. I will work with Laura to develop methods for species delimitation that can be used for recently-diverged species when morphological characteristics give little information about species memberships. This builds on our earlier work (see this link), and will use more recent work concering species tree inference using algebraic statistical methods.

The official project webpage is here.

More information about this project can be found here.

An excellent blog post about colleague Laura Boykin's work on this project is available here.