B-STILL Mammalian Stasis Database ================================ This directory contains the results of stasis analysis on mammalian genomic alignments. The primary data is stored in a SQLite database: `stasis_data.db`. Database Structure ------------------ The database contains two main tables: 1. `stasis_results` Mapping of human (HG38) sequence to site-specific stasis scores. - `gene`: Gene symbol (e.g., A1BG) - `alignment_position`: 1-based position in the codon alignment. - `hg38_codon`: The codon in the human (HG38) sequence (includes gaps as '---'). - `hg38_aa`: The translated amino acid ('-' for gaps, 'X' for unknowns). - `ebf`: The Proximal Empirical Bayes Factor from B-STILL (representing stasis). 2. `stasis_clusters` Regional footprints of extreme purifying selection identified via hypergeometric scan. - `gene`: Gene symbol. - `start`: 1-based starting position of the cluster. - `end`: 1-based ending position of the cluster. - `p_value`: Family-wise error rate (FWER) controlled P-value (via 1000 permutations). - `k`: Number of high-confidence stasis sites (EBF >= 10.0) within the cluster. - `d`: Total span of the cluster in codons. Methodology ----------- - Stasis Scores: Extracted from B-STILL JSON results (MLE index 12), representing the likelihood of a site being under extreme purifying selection (synonymous and non-synonymous rates near zero). - Human Sequence: Extracted from Nexis alignments using HyPhy (extract_hg38.bf). - Cluster Detection: Identified using a Hypergeometric Scan Statistic. A cluster is a region where stasis sites are significantly more dense than expected by chance, correcting for gene length and total number of stasis sites per gene. Scripts ------- - `process_stasis.py`: Populates `stasis_results` for a given gene or all genes. - `infer_stasis_clusters.py`: Original tool for interactive cluster analysis. - `batch_infer_clusters.py`: Optimized multi-core script that populates `stasis_clusters`. Directory Contents ------------------ - `stasis_data.db`: The SQLite database. - `README.txt`: This file. - `infer_stasis_clusters.py`: Cluster inference logic. - `batch_infer_clusters.py`: Batch processing logic.