Evolution and Genomics immersive training opportunities

ASTRAL Tutorial

table of contents

  • expected learning outcome
  • getting started
  • step 1: estimating gene trees in PAUP*
  • steps 2 and 3: running ASTRAL

expected learning outcome

The objective of this activity is to carry out a species-level phylogenetic analysis using multi-locus data under the coalescent model using ASTRAL-III.

getting started

The primary data set we will use for this tutorial is the mammal data set of Song et al. (PNAS 109(37): 14942–14947, 2012). The data consist of 36 species of mammals and an outgroup (chicken). There are 447 loci, for a total of 1,397,456 sites. The data can be downloaded in nexus format from www.stat.osu.edu/~lkubatko/mammals447.nex. On the cluster, you can issue the command: wget http://www.stat.osu.edu/~lkubatko/mammals447.nex

We will carry out an analysis from start to finish in ASTRAL, following the steps outlined in the ASTRAL presentation. This means that we will first estimate gene trees for each locus (step 1) and then use the estimated gene trees as the input for ASTRAL.

step 1: estimating gene trees in PAUP*

  1. Our first step is to estimate gene trees. For the entire dataset, there would be 447 gene trees, and this would take some time. For the purpose of our class exercise, we'll each estimate gene trees for the first 10 genes using PAUP*. The PAUP* commands to do this can be found here: http://www.stat.osu.edu/~lkubatko/run-mammals.nex. On the cluster, you can issue the command wget http://www.stat.osu.edu/~lkubatko/run-mammals.nex (at the system prompt). Then, from within PAUP*, issue the following command: exe run-mammals.nex

  2. The estimated gene trees have been stored in the file "mammal.tre". These will be the input to ASTRAL.

steps 2 and 3: running ASTRAL

  1. To prepare to run ASTRAL on the cluster, issue the command: module load astral

  2. To run ASTRAL, from the same directory as the file you just prepared, issue the command: astral -i mammal.tre -o speciestree_mammals447.tre

    Alternatively, we have prepared a file with the gene trees estimated. If you want to use this file as input to ASTRAL, you can first get the file (wget http://www.stat.osu.edu/~lkubatko/genetrees_mammals.tre) and then use the ASTRAL command: astral -i genetrees_mammals.tre -o speciestree_mammals447.tre.

  3. The estimated species tree will be in the file "speciestree_mammals447.tre". You can visualize this tree using any standard tree viewer that reads the Newick format.


extra fun: comparing ASTRAL and SVDQuartets

  1. For comparison, you can analyze these data using SVDQuartets. If you use the entire data matrix, it will take approximately 30 minutes to estimate the tree on the cluster.
  2. Think about assessment of uncertainty for both methods. Both use the bootstrap, but they will differ in terms of time for the computations, and the ways in which the computations could be parallelized. We can discuss this if there's time.