###################################################################################
###################################################################################
ssr-popgen-toolkit is a modular, user-friendly R-based workflow for population genetics using microsatellite (SSR) datasets.
It provides fully automated preprocessing, clustering, ordination, STRUCTURE-format conversion, and publication-ready visualizations.
Works for any diploid species (plants, animals, fungi, insects, etc.).
- Import SSR matrices from GeneMapper, GenAlEx, or custom formats
- Automated cleaning:
- averaged duplicated loci
- missing data handling (–9 → NA → column mean)
- removal of non-variable markers
- UPGMA clustering (average linkage)
- PCoA (2D, Scree plot, optional 3D)
- STRUCTURE-format conversion
- STRUCTURE barplots (K = 2–8)
- Combined multi-K visualization
- High-resolution export suitable for publication
ssr-popgen-toolkit/
│
├── scripts/
│ ├── 01_upgma_pcoa.R
│ ├── 02_convert_genealex_to_structure.R
│ ├── 03_structure_plots.R
│
├── ssr_popgen_toolkit.pbs
├── LICENSE
├── README.md
└── example_data/
conda create -n ssr_popgen r-base r-dplyr r-tidyr r-stringr \
r-ggplot2 r-ggrepel r-patchwork r-plotly -c conda-forge -c bioconda
conda activate ssr_popgenInstall ape:
install.packages("ape")File: scripts/01_upgma_pcoa.R
Performs:
- Load SSR data
- Average duplicate loci
- Replace –9 with imputed means
- Z-score scaling
- Euclidean distance
- UPGMA dendrogram (PNG)
- PCoA 2D plot
- Scree plot
- Optional 3D PCoA
Rscript scripts/01_upgma_pcoa.R input.txt prefixFile: scripts/02_convert_genealex_to_structure.R
Converts standard GenAlEx SSR tables into STRUCTURE-compatible format.
- Converts diploid alleles
- Handles missing data (NA → –9)
- Outputs clean STRUCTURE file
Rscript scripts/02_convert_genealex_to_structure.R genalex_input.txt structure_input.txtFile: scripts/03_structure_plots.R
Produces:
- Sorted STRUCTURE barplots
- Multi-K combined figure (K = 5 → 2)
- Publication-quality PNGs
Rscript scripts/03_structure_plots.R Qmatrix.txt output_prefixFile: ssr_popgen_toolkit.pbs
qsub -v INPUT=ssr_data.txt,PREFIX=myrun,CONDA_ENV=/path/to/env ssr_popgen_toolkit.pbs1️⃣ Export SSR peak data in GeneMapper.
2️⃣ Import to GenAlEx → generate allele matrix.
3️⃣ Run 01_upgma_pcoa.R for clustering + PCoA.
4️⃣ Convert GenAlEx file → STRUCTURE with 02_convert_genealex_to_structure.R.
5️⃣ Run STRUCTURE (K = 1–8).
6️⃣ Upload results.zip to StructureSelector or CLUMPAK.
7️⃣ Identify optimal K using Evanno's method.
8️⃣ Visualize Q matrices using 03_structure_plots.R.
If you use this toolkit, please cite:
Boutsika, A. (2025). ssr-popgen-toolkit (v1.0.0). Zenodo.
https://doi.org/10.5281/zenodo.17856732
@software{boutsika_2025_ssr_popgen_toolkit,
author = {Boutsika, Anastasia},
title = {ssr-popgen-toolkit: A Comprehensive Workflow for SSR Genotyping, UPGMA Clustering, PCoA, and STRUCTURE Analysis},
year = {2025},
publisher = {Zenodo},
version = {v1.0.0},
doi = {10.5281/zenodo.17856732},
url = {https://doi.org/10.5281/zenodo.17856732}
}Pritchard, J.K., Stephens, M., & Donnelly, P. (2000). Genetics 155:945–959.
Evanno et al. (2005). Mol Ecol 14:2611–2620.
Kopelman et al. (2015). Mol Ecol Resour 15:1179–1191.
Francis (2017). Mol Ecol Resour 17:27–32.
Paradis et al. (2004). Bioinformatics 20:289–290.
Wickham (2016), Slowikowski (2023).
This project is released under the MIT License.
See LICENSE for details.