Synonymous site-to-site substitution rate variation dramatically inflates false positive rates of selection analyses: ignore at your own peril

Sadie R. Wisotsky, Sergei L. Kosakovsky Pond, Stephen D. Shank, and Spencer V. Muse

This directory contains empirical and simulated data from our MBE paper.

  1. 11 empirical alignments analyzed in Analysis of reference datasets. in NEXUS format and are available from empirical/11-datasets.zip

  2. 13714 empirical alignments from the Selectome dataset in NEXUS format and are available from empirical/selectome.tar.bz2 [88M]

  3. Simulation datasets based on 16 and 31 taxon tree are in simulations/16-seq.tar.bz2 [183M] and simulations/31-seq.tar.bz2 [412M]. Parameter values used for each simulation (coded using the filename), are described in the file simulations/sim-key.csv

  4. Simulation datasets based on genes from the Shultz and Sackton paper to evaluate method performance with varied distributions simulations-shultz-sackton/sim.tar.bz2 [279M]. Parameter values used for each simulation (coded using the filename), are described in the file simulations-shultz-sackton/sim_settings.csv

How to simulate data under BUSTED and BUSTED[S]?

Please use the script at https://github.com/veg/hyphy-analyses/tree/master/SimulateMG94 (see https://github.com/veg/hyphy-analyses/ for installation instructions) from a given tree (tree.nwk in these examples)

Strict null (ω=1)

10 replicates of 400 codons each, saving to data/strict_null.replicate.xx

hyphy SimulateMG94.bf --tree tree.nwk --replicates 10 \
--branch-variattion constant --site-varation constant \
--omega 1.0 --sites 400 --output data/strict_null 

Branch site (BUSTED style), no synonymous site-to-site rate variation

10 replicates of 400 codons each, saving to data/bs_rel.replicate.xx. ω distribution is specified as a comma-separated set of rate, weight pairs. For example, the string below specifies the omega distribution of

ω weight
0.2 0.5
1.0 0.4
5.0 0.1
hyphy SimulateMG94.bf --replicates 10  --tree tree.nwk --model bs-rel \
--sites 400 --omegas 0.2,0.5,1.0,0.4,5.0,0.1 --output data/bs_rel

Branch site and synonymous site-to-site rate variation (BUSTED[S])

10 replicates of 1000 codons each, saving to data/bs_rel_srv.replicate.xx. ω distribution is specified as a comma-separated set of rate, weight pairs. For example, the string below specifies the omega distribution of

ω weight
0.2 0.5
1.0 0.4
5.0 0.1

Similarly, the alphas string, specifies the rate distribution for site-specific substitution rates.

For example, the string below specifies the alpha distribution of

α weight
0.1 0.4
1.0 0.4
10.0 0.2

Note that this distribition will be automatically normalized to have mean 1, in this case to

α weight
0.04098360655737705 0.4
0.4098360655737705 0.4
4.098360655737705 0.2
hyphy SimulateMG94.bf --replicates 10  --tree tree.nwk --model bs-rel-srv \
--sites 1000 --omegas 0.2,0.5,1.0,0.4,5.0,0.1 --alphas 0.1,0.4,1.0,0.4,10.0,0.2\
--output data/bs_rel_srv