We’re often asked why whole genome sequencing (WGS) is only performed at 30X coverage while whole exome sequencing (WES) is typically performed at 100X coverage. Doesn’t 100X provide better variant coverage than 30X? If we’re comparing apples to apples, the answer would logically be yes. But in reality we’re comparing apples to oranges, so the answer is not so intuitively no. Here’s why.
For accurate analysis, every nucleotide in a DNA sample needs to be sequenced several times. First, multiple data points are needed to make a reliable call of which nucleotide is present at a given location. Picture a site where the reference nucleotide is a G. Now imagine that the DNA sample is heterozygous for a T variant. If we sequence the site once, we have a 50% chance of missing the T variant. But if we sequence the site multiple times, our chance of missing the T variant decreases significantly. Similarly, our chance of misidentifying a stray, wrongly called A or C variant as a real variant decreases significantly.
Second, sequencing reads are a random and independent sampling and are not distributed equally across the genome. As a result, some sites will be covered by more reads than average while other sites will be covered by fewer reads than average. The average read depth that is needed to ensure sufficient minimum coverage of each site in the genome is highly dependent on the method used to prepare the DNA for sequencing.
WES uses PCR-based protocols
DNA for WES is prepared using PCR-based protocols to selectively target the regions to be sequenced. The purified genomic DNA is typically fragmented, ligated with adapters, PCR amplified, captured and then further PCR amplified using indexed primers. Because PCR primers anneal more easily to some regions of the genome than others, some regions will become amplified many times while others will be infrequently amplified. To limit the impact of PCR bias, these samples are typically sequenced to an average read depth of 100X. This is the depth required to minimize the number of sites with insufficient data points to make a high confidence nucleotide call.
WGS uses PCR-free protocols
DNA for WGS is prepared using PCR-free protocols. The purified genomic DNA is fragmented and ligated with indexed adapters with no amplification required. This method of preparation reduces the bias and gaps associated with PCR preparations, in turn producing largely uniform coverage of the entire genome. As a result, an average read depth of only 30X is required to ensure that there are a sufficient number of data points to make a high confidence nucleotide call at nearly all sites.
To create an illustrative apples to apples variant coverage comparison, we sequenced Genome in a Bottle sample NA12878 at 75X coverage for both WES and WGS. The results for single nucleotide variants are shown in the following table.
When we calculate the percentage of false negative variants (those variants that are known to be present in the sample but were not detected) compared to the total number of variants present (FN/(TP+FN)), we get 2.17% for WES versus 0.022% for WGS. Expressed another way, WES at 75X coverage will miss 2 out of every 100 variants while WGS at the same 75X coverage will only miss 2 out of every 10,000 variants. Similarly, as shown in the table above, the sensitivity and positive predictive value is greater for WGS than WES.
The differences become less significant as WES coverage increases to 100X and WGS decreases to 30X, but the overall trend holds – WGS requires significantly less mean coverage than WES for accurate variant detection and comprehensive variant coverage. A trend that is even more pronounced for small indels.
Interested in knowing more about how WGS’s uniform coverage additionally enables identification of larger structural variants, including CNVs, from the same DNA sample? Read our related post here.