Detecting large structural variants using WGS

structural variants

 

In this week’s post we’re taking a closer look at large (>50bp) structural variants, including copy number variants (CNVs). Previously, we touched on CNVs and their prevalence in neurodevelopmental disorders (read the post here). We noted how whole genome sequencing (WGS) technology provides unique opportunities for detection of CNVs. But we’re often asked what specific types of structural variants are detected by WGS. Read on to learn more, then download the one-page reference guide which includes a graphic representation of each variant type.

Let’s start with a definition of the different types of structural variants:

Deletion A sequence alteration where a stretch of contiguous nucleotides has been excised
Insertion A sequence alteration where a stretch of sequence has been added between two adjacent nucleotides in the sequence
Tandem duplication A sequence alteration where the copy number of a given region is greater than the reference sequence (copy number gain), consisting of two identical adjacent regions
Reverse tandem duplication A sequence alteration where the copy number of a given region is greater than the reference sequence (copy number gain), consisting of two identical adjacent regions with opposite gene order
Non-tandem duplication A sequence alteration where the copy number of a given region is greater than the reference sequence (copy number gain), consisting of two identical non-adjacent regions
Inversion A sequence alteration where a continuous nucleotide sequence is inverted in the same position
Translocation A region of nucleotide sequence that has translocated to a new position
Balanced translocation A region of nucleotide sequence that has switched location with another region of nucleotide sequence located on another chromosome, without loss or gain of genetic material
Unbalanced translocation A region of nucleotide sequence that has switched location with another region of nucleotide sequence located on another chromosome, resulting in loss or gain of genetic material
Complex rearrangement A structural sequence alteration or rearrangement encompassing one or more genome fragments
Short tandem repeat variation A sequence alteration where a tandem repeat is expanded or contracted with regard to a reference sequence
Deletion
A sequence alteration where a stretch of contiguous nucleotides has been excised
Insertion
A sequence alteration where a stretch of sequence has been added between two adjacent nucleotides in the sequence
Tandem duplication
A sequence alteration where the copy number of a given region is greater than the reference sequence (copy number gain), consisting of two identical adjacent regions
Reverse tandem duplication
A sequence alteration where the copy number of a given region is greater than the reference sequence (copy number gain), consisting of two identical adjacent regions with opposite gene order
Non-tandem duplication
A sequence alteration where the copy number of a given region is greater than the reference sequence (copy number gain), consisting of two identical non-adjacent regions
Inversion
A sequence alteration where a continuous nucleotide sequence is inverted in the same position
Translocation
A region of nucleotide sequence that has translocated to a new position
Balanced translocation
A region of nucleotide sequence that has switched location with another region of nucleotide sequence located on another chromosome, without loss or gain of genetic material
Unbalanced translocation
A region of nucleotide sequence that has switched location with another region of nucleotide sequence located on another chromosome, resulting in loss or gain of genetic material
Complex rearrangement
A structural sequence alteration or rearrangement encompassing one or more genome fragments
Short tandem repeat variation
A sequence alteration where a tandem repeat is expanded or contracted with regard to a reference sequence

With few exceptions, large structural variants like these are not detectable by exome sequencing. They can however be detected by WGS. This is in large part due to the consistent read depth that WGS generates. But to take advantage of the available data it’s necessary to use the right algorithms. In general terms, our algorithms are centered on two distinct analysis strategies: breakpoint analysis and read depth analysis.

Breakpoint analysis

Breakpoint analysis takes advantage of two types of reads: split reads and discordant reads. Under normal circumstances, a given paired sequence read will align to a single region of the genome. But for split and discordant reads, the paired read aligns to two distinct regions of the genome with little or no overlap. In the case of split reads, the breakpoint occurs within one of the reads and can be identified to the resolution of a single base pair. In the case of discordant reads, the breakpoint occurs in the insert between the reads, resulting in an unexpected span size or inconsistent orientation. Both are indicative of structural variation.

discordant readsdiscordant reads

 

Read depth analysis

Read depth analysis takes advantage of the expectation of consistent coverage across the genome. Regions with unexpected levels of coverage – both significantly higher (>=2X) and significantly lower (<=.5X) – are indicative of structural variation.

read depth analysis

 

Considering these three signals alongside additional lines of evidence makes it possible to detect nearly all of the above variant types as part of the Variantyx Unity™ test. Only balanced translocations are pending, to be addressed soon in a future release. For quick reference, download our one-page reference guide for structural variants including a graphic representation of each variant type.

In our next post we’ll address another common question: why a lower mean sequencing depth for WGS (30X) produces better coverage than a higher mean sequencing depth for exomes (100X).

Read about it here …

 

Scroll Up