GC Skewing


Enter your sequence in the box, using FASTA format. Sequence description is optional. A sample sequence is provided.
All non-DNA characters A, C, G, T, and N will be stripped before computing GC skew.

Window size determines the number of bases over which the GC skew will be computed.
Step size determines the number of bases between consecutive windows.

Window Size Step Size

 

 

What is GC Skewing?

If DNA were random strings of letters, you would expect about half of the G's in a genome to be on the leading strand, and the other half on the lagging strand. However, one strand of DNA often has significantly more than its share of G's (thereby causing the other strand to have significantly more than its share of C's). For example, the origin and terminus of replication in a circular chromosome often have unusually even or unusually uneven distribution of G's and C's. The unevenness, or skew, is measured in a "window," or subsequence. By sliding the window along the sequence, unusually even or unusually uneven distributions can be located. GC Skew is calculated as (G - C) / (G + C), where G is the number of G's in the window, and C is the number of C's. Before begin reported and plotted, the GC-skew is multiplied by w/c, where w = Window size and c = sequence length. This scaling factor helps avoid dependence on the window size and sequence length (even though it depends on the ratio between these two numbers).

Interpreting GC-Skew Graphs

The sample sequence on this web page is approximately 10 kb from Mycoplasma pneumoniae. The annotated origin of replication is approximately in the center of the sequence. Experiment with different window sizes (from 100 to 3000) and step sizes (from 20 to 200) to see which combination is best for finding the origin of replication. The origin is typically associated with a change in sign of the GC-skew. However, there are usually many such changes in sign, especially for smaller window sizes. Therefore, a second measure, the cumulative skew, is used. The cumulative skew is simply the running sum of the skew values in each window. The origin of replication is associated with the global minimum of the cumulative skew (or global maximum if the lagging strand is analyzed, as in this example).

Other GC Skewing Web Sites

The following resources allow you to read and experiment further with GC Skewing.

A. Grigoriev (1998) Analyzing genomes with cumulative skew diagrams Nucl. Acids Res. 26: 2286-2290.

MIPS (in Germany)

GraphDNA at Viral Bioinformatics Resource Center, University of Victoria, Canada

Bioinformatics course student project, in Canada

Bioinformatics course lab exercise, in Sweden