Skip to content

Wrong reference allele #412

@jodyphelan

Description

@jodyphelan

I noticed one edge case where I think the refrence allele seems to be missing one nucleotide at the end.

I've tested this using delly v1.3.3 on linux.

This can be replicated using the following bit of code:

mamba create -n test delly=1.3.3 bcftools samtools minimap2 bwa megahit
mamba activate test
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR288/051/SRR28860051/SRR28860051_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR288/051/SRR28860051/SRR28860051_2.fastq.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/195/955/GCF_000195955.2_ASM19595v2/GCF_000195955.2_ASM19595v2_genomic.fna.gz
gunzip GCF_000195955.2_ASM19595v2_genomic.fna.gz
bwa index GCF_000195955.2_ASM19595v2_genomic.fna
bwa mem -t 8 GCF_000195955.2_ASM19595v2_genomic.fna SRR28860051_* | samtools sort -@ 8 -o SRR28860051.bam -
samtools index SRR28860051.bam
delly call -g GCF_000195955.2_ASM19595v2_genomic.fna SRR28860051.bam > delly.vcf
megahit  -o megahit -1 SRR28860051_1.fastq.gz -2 SRR28860051_2.fastq.gz -t 8
minimap2 --cs=long megahit/final.contigs.fa GCF_000195955.2_ASM19595v2_genomic.fna  > asm.paf  

When we extract the deletion from the delly VCF:

bcftools view delly.vcf | grep 13055

We get the following:

NC_000962.3	1305529	DEL00000390	TGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGATGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGA	T	180	PASS	PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.3.3;END=1305652;PE=0;MAPQ=0;CT=3to5;CIPOS=-34,34;CIEND=-34,34;SRMAPQ=60;INSLEN=0;HOMLEN=34;SR=3;SRQ=0.993289;CONSENSUS=CGCGCCAGCGCACTACCACATACGCCCTGCTTGCGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGTTGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTGTGCCGCCCCCGGC;CE=1.89334;CONSBP=68	GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV	1/1:-272.96,-27.3542,0:10000:PASS:4781:2916:6929:0:0:0:0:91

Now extracting the relevant region from the assembly alignment:

grep 1278679 asm.paf | paftools.js view -f aln -   | grep CCCTGCTTGC -A 5

We get:

Ref+:      26801 ATACGCCCTGCTTGC----------------------------------------------------------------- 26815
                 |||||||||||||||
Qry+:    1305480 ATACGCCCTGCTTGCggcctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctggc 1305559

Ref+:      26816 -----------------------------------------------------------GGCCTAGCCCCGGCGACGATG 26836
                                                                            |||||||||||||||||||||
Qry+:    1305560 ctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctGGCCTAGCCCCGGCGACGATG 1305639

Ref+:      26837 CCGGGTCGCGGGATGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 26916
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Qry+:    1305640 CCGGGTCGCGGGATGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 1305719

And with a bit of manual rearrangement we get:

Ref+:      26801 ATACGCCCTGCTTGCGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGAT------------------------------ 26815
                 ||||||||||||||||||||||||||||||||||||||||||||||||||
Qry+:    1305480 ATACGCCCTGCTTGCggcctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctggc 1305559

Ref+:      26816 -------------------------------------------------------------------------------- 26836
                                                                            
Qry+:    1305560 ctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctGGCCTAGCCCCGGCGACGATG 1305639

Ref+:      26837               GGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 26916
                 --------------||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Qry+:    1305640 CCGGGTCGCGGGATGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 1305719

If I am interpreting this correctly, it is indicating that the reference/alt should be:

TGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGATGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGAT  ->  T

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions