Skip to content

manastast/ssr-popgen-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

###################################################################################

SSR POPGEN TOOLKIT

A Comprehensive Workflow for SSR Genotyping, UPGMA, PCoA & STRUCTURE

###################################################################################

DOI License: MIT

ssr-popgen-toolkit

ssr-popgen-toolkit is a modular, user-friendly R-based workflow for population genetics using microsatellite (SSR) datasets.
It provides fully automated preprocessing, clustering, ordination, STRUCTURE-format conversion, and publication-ready visualizations.
Works for any diploid species (plants, animals, fungi, insects, etc.).


FEATURES

  • Import SSR matrices from GeneMapper, GenAlEx, or custom formats
  • Automated cleaning:
    • averaged duplicated loci
    • missing data handling (–9 → NA → column mean)
    • removal of non-variable markers
  • UPGMA clustering (average linkage)
  • PCoA (2D, Scree plot, optional 3D)
  • STRUCTURE-format conversion
  • STRUCTURE barplots (K = 2–8)
  • Combined multi-K visualization
  • High-resolution export suitable for publication

DIRECTORY STRUCTURE

ssr-popgen-toolkit/
│
├── scripts/
│   ├── 01_upgma_pcoa.R
│   ├── 02_convert_genealex_to_structure.R
│   ├── 03_structure_plots.R
│
├── ssr_popgen_toolkit.pbs
├── LICENSE
├── README.md
└── example_data/

INSTALLATION

Using Conda (recommended)

conda create -n ssr_popgen r-base r-dplyr r-tidyr r-stringr \
    r-ggplot2 r-ggrepel r-patchwork r-plotly -c conda-forge -c bioconda
conda activate ssr_popgen

Install ape:

install.packages("ape")

SCRIPT 1 — UPGMA + PCoA

File: scripts/01_upgma_pcoa.R

Performs:

  • Load SSR data
  • Average duplicate loci
  • Replace –9 with imputed means
  • Z-score scaling
  • Euclidean distance
  • UPGMA dendrogram (PNG)
  • PCoA 2D plot
  • Scree plot
  • Optional 3D PCoA

Usage

Rscript scripts/01_upgma_pcoa.R input.txt prefix

SCRIPT 2 — GenAlEx → STRUCTURE

File: scripts/02_convert_genealex_to_structure.R

Converts standard GenAlEx SSR tables into STRUCTURE-compatible format.

  • Converts diploid alleles
  • Handles missing data (NA → –9)
  • Outputs clean STRUCTURE file

Usage

Rscript scripts/02_convert_genealex_to_structure.R genalex_input.txt structure_input.txt

SCRIPT 3 — STRUCTURE BARPLOTS (K = 2–8)

File: scripts/03_structure_plots.R

Produces:

  • Sorted STRUCTURE barplots
  • Multi-K combined figure (K = 5 → 2)
  • Publication-quality PNGs

Usage

Rscript scripts/03_structure_plots.R Qmatrix.txt output_prefix

PBS SCRIPT FOR HPC

File: ssr_popgen_toolkit.pbs

Example qsub command

qsub -v INPUT=ssr_data.txt,PREFIX=myrun,CONDA_ENV=/path/to/env ssr_popgen_toolkit.pbs

COMPLETE WORKFLOW OVERVIEW

1️⃣ Export SSR peak data in GeneMapper.
2️⃣ Import to GenAlEx → generate allele matrix.
3️⃣ Run 01_upgma_pcoa.R for clustering + PCoA.
4️⃣ Convert GenAlEx file → STRUCTURE with 02_convert_genealex_to_structure.R.
5️⃣ Run STRUCTURE (K = 1–8).
6️⃣ Upload results.zip to StructureSelector or CLUMPAK.
7️⃣ Identify optimal K using Evanno's method.
8️⃣ Visualize Q matrices using 03_structure_plots.R.


CITATION — SSR POPGEN TOOLKIT

If you use this toolkit, please cite:

Boutsika, A. (2025). ssr-popgen-toolkit (v1.0.0). Zenodo.
https://doi.org/10.5281/zenodo.17856732

BibTeX

@software{boutsika_2025_ssr_popgen_toolkit,
  author       = {Boutsika, Anastasia},
  title        = {ssr-popgen-toolkit: A Comprehensive Workflow for SSR Genotyping, UPGMA Clustering, PCoA, and STRUCTURE Analysis},
  year         = {2025},
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.17856732},
  url          = {https://doi.org/10.5281/zenodo.17856732}
}

REQUIRED METHOD CITATIONS

STRUCTURE

Pritchard, J.K., Stephens, M., & Donnelly, P. (2000). Genetics 155:945–959.

Evanno ΔK

Evanno et al. (2005). Mol Ecol 14:2611–2620.

CLUMPAK

Kopelman et al. (2015). Mol Ecol Resour 15:1179–1191.

pophelper

Francis (2017). Mol Ecol Resour 17:27–32.

ape (UPGMA tree)

Paradis et al. (2004). Bioinformatics 20:289–290.

ggplot2 / ggrepel

Wickham (2016), Slowikowski (2023).


LICENSE

This project is released under the MIT License.
See LICENSE for details.

About

A complete SSR population genetics toolkit for any species. Includes data cleaning, UPGMA clustering, PCoA, STRUCTURE formatting, STRUCTURE barplot visualization, and supporting scripts for reproducible microsatellite population analyses.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors