Extra base hits: widespread empirical support for instantaneous multiple-nucleotide changes.

Alexander G Lucaci, Sadie R Wisotsky, Stephen D. Shank, Steven Weaver, Sergei L. Kosakovsky Pond

with questions e-mail spond@temple.edu

This directory contains empirical and simulated data from our this manuscript (preprint). JSON files output by the analyzes can be visualized with hyphy-vision, and I have put together an Observable notebook with analysis summaries.

  1. 13 empirical alignments analyzed in Benchmark datasets (Table 2) in NEXUS format and are available from empirical/benchmark.zip

  2. 13714 empirical alignments (BZ2 compressed tarball) from the Selectome dataset in NEXUS format are available from http://data.hyphy.org/web/busteds/empirical/selectome.tar.bz2 [88M].

  3. 9861 empirical alignments from 24 mammalian species (BZ2 compressed tarball) from the Enard and Petrov dataset in FASTA format are available from empirical/enard.tar.bz2 [61M].

  4. 11262 empirical alignments from 39 different species of birds (BZ2 compressed tarball) from the Shultz and Sackton dataset in PHYLIP format are available from empirical/shultz.tar.bz2 [78M].

  5. 814 empirical mtDNA alignments (ZIP file) from the Mannino et al dataset in FASTA format and are available from empirical/mtdna.zip [2.2M].


  1. 5000 simulated datasets based on a 2 taxon tree are in simulations/2.tar.bz2 [94M] as FASTA files. These are null simulatons (only 1H allowed)

  2. 2500 simulated datasets based on a latin hypercube parameter sampling and 4 empirical datasets are in simulations/msa.tar.bz2 [9.4M]. These data contain null and power simulations; parameter values used to generate an alignment can be found in sims-name (e.g. sims-yokoyama) .csv files.

  3. 1100 alignments generated from Indelible simulations (as FASTA files) are in simulations/indel.tar.bz2 [2.6M]. The indel rate is encoded in the file name, and the control file for Indelible is included in the directory.