I noticed one edge case where I think the refrence allele seems to be missing one nucleotide at the end.
I've tested this using delly v1.3.3 on linux.
This can be replicated using the following bit of code:
mamba create -n test delly=1.3.3 bcftools samtools minimap2 bwa megahit
mamba activate test
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR288/051/SRR28860051/SRR28860051_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR288/051/SRR28860051/SRR28860051_2.fastq.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/195/955/GCF_000195955.2_ASM19595v2/GCF_000195955.2_ASM19595v2_genomic.fna.gz
gunzip GCF_000195955.2_ASM19595v2_genomic.fna.gz
bwa index GCF_000195955.2_ASM19595v2_genomic.fna
bwa mem -t 8 GCF_000195955.2_ASM19595v2_genomic.fna SRR28860051_* | samtools sort -@ 8 -o SRR28860051.bam -
samtools index SRR28860051.bam
delly call -g GCF_000195955.2_ASM19595v2_genomic.fna SRR28860051.bam > delly.vcf
megahit -o megahit -1 SRR28860051_1.fastq.gz -2 SRR28860051_2.fastq.gz -t 8
minimap2 --cs=long megahit/final.contigs.fa GCF_000195955.2_ASM19595v2_genomic.fna > asm.paf
When we extract the deletion from the delly VCF:
bcftools view delly.vcf | grep 13055
We get the following:
NC_000962.3 1305529 DEL00000390 TGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGATGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGA T 180 PASS PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.3.3;END=1305652;PE=0;MAPQ=0;CT=3to5;CIPOS=-34,34;CIEND=-34,34;SRMAPQ=60;INSLEN=0;HOMLEN=34;SR=3;SRQ=0.993289;CONSENSUS=CGCGCCAGCGCACTACCACATACGCCCTGCTTGCGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGTTGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTGTGCCGCCCCCGGC;CE=1.89334;CONSBP=68 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-272.96,-27.3542,0:10000:PASS:4781:2916:6929:0:0:0:0:91
Now extracting the relevant region from the assembly alignment:
grep 1278679 asm.paf | paftools.js view -f aln - | grep CCCTGCTTGC -A 5
We get:
Ref+: 26801 ATACGCCCTGCTTGC----------------------------------------------------------------- 26815
|||||||||||||||
Qry+: 1305480 ATACGCCCTGCTTGCggcctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctggc 1305559
Ref+: 26816 -----------------------------------------------------------GGCCTAGCCCCGGCGACGATG 26836
|||||||||||||||||||||
Qry+: 1305560 ctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctGGCCTAGCCCCGGCGACGATG 1305639
Ref+: 26837 CCGGGTCGCGGGATGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 26916
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Qry+: 1305640 CCGGGTCGCGGGATGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 1305719
And with a bit of manual rearrangement we get:
Ref+: 26801 ATACGCCCTGCTTGCGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGAT------------------------------ 26815
||||||||||||||||||||||||||||||||||||||||||||||||||
Qry+: 1305480 ATACGCCCTGCTTGCggcctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctggc 1305559
Ref+: 26816 -------------------------------------------------------------------------------- 26836
Qry+: 1305560 ctagccccggcgacgatgccgggtcgcgggatgcggcccgttgaggagcggggcaatctGGCCTAGCCCCGGCGACGATG 1305639
Ref+: 26837 GGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 26916
--------------||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Qry+: 1305640 CCGGGTCGCGGGATGGGGCCCGCATGGGCTTAATAGTTGTTGCAGGAGCCGGCAACCGACTCGACAAGGCCGATGTACTG 1305719
If I am interpreting this correctly, it is indicating that the reference/alt should be:
TGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGATGCGGCCCGTTGAGGAGCGGGGCAATCTGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGAT -> T
I noticed one edge case where I think the refrence allele seems to be missing one nucleotide at the end.
I've tested this using delly v1.3.3 on linux.
This can be replicated using the following bit of code:
When we extract the deletion from the delly VCF:
We get the following:
Now extracting the relevant region from the assembly alignment:
We get:
And with a bit of manual rearrangement we get:
If I am interpreting this correctly, it is indicating that the reference/alt should be: