Need to write a reproducible script that:
- Downloads dbSNP data
- Parses it (it's JSON lines format so this is trivial)
- Extracts dbSNP identifier to REFSEQ or gene mappings so for SNPs like rs429358, we can automatically generate BEL graphs including:
- For mutations inside genes, get equivalences between reference genes starting with
NG_ to Entrez Gene identifiers and HGNC when human like g(NG_007084.2) eq g(HGNC:APOE)
- Reference gene
g(NG_007084.2) hasVariant g(DBSNP:rs429358)
- Impact on gene
g(DBSNP:rs429358) eq g(NG_007084.2, var("g.7903T>C"))
- Reference transcript(s)
r(NM_001302688.2) hasVariant r(DBSNP:rs429358)
- Impact on transcript(s)
r(DBSNP:rs429358) eq r(NM_001302688.2, var("c.466T>C"))
- Reference protein, when available
p(NP_001289617) hasVariant p(DBSNP:rs429358)
- Impact on protein, when available
p(DBSNP:rs429358) eq p(NP_001289617.1, var("p.Cys156Arg")
- Mappings between various RefSeq identifiers on the genomic level to genes in Entrez or HGNC
Need to write a reproducible script that:
NG_to Entrez Gene identifiers and HGNC when human likeg(NG_007084.2) eq g(HGNC:APOE)g(NG_007084.2) hasVariant g(DBSNP:rs429358)g(DBSNP:rs429358) eq g(NG_007084.2, var("g.7903T>C"))r(NM_001302688.2) hasVariant r(DBSNP:rs429358)r(DBSNP:rs429358) eq r(NM_001302688.2, var("c.466T>C"))p(NP_001289617) hasVariant p(DBSNP:rs429358)p(DBSNP:rs429358) eq p(NP_001289617.1, var("p.Cys156Arg")