Sequencing in Context: GC/AT-Rich Regions
High GC-rich regions can be difficult to amplify and can introduce bias into sequencing as PCR-amplification of are often hampered by the formation of secondary structures like hairpins and higher melting temperature. This can be seen in the figure below by loman et al.
Figure: The coverage of sequencing drops off at both extremities of GC regions – 80% GC-content and 20% GC-content.
As seen in P. falciparum chromosome, the percentage of the genome covered when the GC-regions start getting above 70% become less effective. Similarly, the regions that are high AT-rich (low GC content) seem to have an effect also, I look into this effect in more detail below.
Conversely, AT-rich regions can also provide issues as shown in Quail et al., this can be seen in the figure below in the Ion Torrent PGM when sequencing the AT-rich. Establishing the context of your variant calling within particular GC-rich or AT-rich regions is therefore crucial.
Figure: Artemis genome browser screenshots illustrate the variation in sequence coverage of a selected region of P. falciparum chromosome 11, with 15x depth of randomly normalized sequence from the platforms tested. A heatmap of the errors, normalized by the amount of mapping reads is included just below the GC content graph (PacBio top line, PGM middle and MiSeq bottom). B) Coverage over region of extreme GC content, ranging from 70% to 0%.
Why is this happening?
Genes within introns and AT-rich exonic segments, with approximately 30% of the genome having no sequence coverage whatsoever. This bias was observed with libraries prepared using both enzymatic and physical shearing
In WES, differences in the hybridization efficiency of sequence capture probes, which are possibly again attributable to GC content variation, can result in target regions that have little or no coverage. Uniformity of coverage will also be influenced by repetitive or low-complexity sequences, which either restrict bait design or lead to off-target capture
There are many challenges in validating NGS panels since many factors, including GC content, influence a sample’s coverage profile, these methods must attempt to correct for these biases to provide adequate specificity.