Talk:CpG site

Latest comment: 3 years ago by 195.139.144.123 in topic random chance

CpG continents edit

I've reverted an edit from an anonymous source that claims the term "CpG continents" exists. If this term is used, I have never personally heard it or read about it. It does not exist in Google. It's amusing, but I'm pretty sure it's not real. If you do have a reference that would justify adding this information, please use it, otherwise I'm going to assume these edits are vandalism and revert them. -Madeleine 19:18, 14 March 2007 (UTC)Reply

References edit

The 1st reference (Jabbari and Bernardi, 2003) does not discuss the abundance of CpG methylation and certainly gives no such 70-80% figure that I can see. However I can't find an alternative reference so I will leave this for someone more knowledgeable to fix. Blacknightshade (talk) 16:38, 22 June 2011 (UTC)Reply

Not sure why request for clarification was made edit

The usual formal definition of a CpG island is a region with at least 200 bp, and a GC percentage that is greater than 50%, [clarification needed] and with an observed-to-expected CpG ratio that is greater than 60%.

@Exercisephys: - I'd like to understand your edit for clarification needed. The sentence, as written seems clear. Are you questioning what seems like a disagreement in number with the 50% and 60% number?

You more than likely know that the four DNA bases are A, T, G, C and that A hydrogen bonds on the opposite strand with T and G with C. When you refer to the GC percentage of a region of DNA, you are referencing the number of G's or C's, in no particular order in that sequence. This is standard parlance in genomics. There are implications to the GC content of a region of DNA, as GC rich regions have higher melt temperatures than regions with more AT. This is due to how the bases stack in the helical structure and that there are 3 hydrogen bonds in the GC base-pairing.

The reference ratio of greater than 60% CpG means that from the 5' end of the sequence, C is followed by a G 60% of the time.

Am0210 (talk) 01:07, 13 October 2014 (UTC)Reply

What needs to be clarified is 50% of what, and the definition you give Am0210 is also incorrect. The statement as it stands can be to read on the lines that CpG islands are where CpG sites cover more than 50% of all 2-pair sites (wildly incorrect), or more than 50% of CpX-sites (also incorrect).
Random distribution would place CpG sites on 1/8 of all sites, and a C would be followed by a G in 25% of occurrences given pure chance, which is roughly the distribution which can be seen when it comes to other pairs. Due to the effects of methylation these occurences grow rarer (on an evolutionary timescale) in DNA that is often methylated (as CpG is turned in TpG), and more common where DNA is seldom methylated. A more correct definition would imply that a CpG island is where CpG sites occur in more than 50% of what random distribution would yield (1/8), which actually gives a figure closer to 3,125% of all sites, or 12,5% of all CpX-sites, rather than 50%. I don't have access to the definition source, which is why I haven't clarified this myself, but as it stands the statement is vague bordering on supremely false.-- CFCF 🍌 (email) 10:40, 13 October 2014 (UTC)Reply
I'm sorry I didn't read your statement or the statement in the article properly, what I think needs clarifying is the observed-to-expected ratio, which is what I explained above. I think this must have been the intent of the tagger, and I've moved the tag down the sentence. -- CFCF 🍌 (email) 10:48, 13 October 2014 (UTC)Reply

The formulas for calculating the observed to expected ratio are a bit misleading. The observed is simply the number of times "CG" occur in the sequence. The expected is the number of C's times the number of G's divided by the length of the sequence. Although the math works out the same when calculating the ratio, the formulas given for the numerator and denominator are wrong when taken by themselves. — Preceding unsigned comment added by 129.244.14.85 (talk) 20:06, 11 November 2015 (UTC)Reply

Picture of nucleotide sequence has bad contrast edit

I like the idea of having a picture showing a sequence with CpG sites, however on my screen I can't really read the Cs and Gs, since the yellow text on white background has an unfortunately bad contrast. Could someone maybe adjust it and choose another colour, like lighter blue..? IronicPseudonyme (talk) 08:54, 28 November 2014 (UTC)Reply

I tried to solve this problem, with figures displaying a better contrast (see version on 27 January 2019‎). Manudouz (talk) 17:11, 24 April 2020 (UTC)Reply

random chance edit

I dislike the phrase "random chance" because "chance" is inherently "random". I vote for replacing it with just "chance". — Preceding unsigned comment added by 195.139.144.123 (talk) 08:45, 1 February 2021 (UTC)Reply