Field of Science

ENCODE: What defines genomic function?

ResearchBlogging.orgA new wealth of articles by the ENCODE (the ENCyclopedia Of DNA Elements) consortium suggest that far more of the human genome carries out some function or other, and one might conclude that very little DNA is junk:

From an introduction to the new ENCODE papers:
Collectively, the papers describe 1,640 data sets generated across 147 different cell types. Among the many important results there is one that stands out above them all: more than 80% of the human genome's components have now been assigned at least one biochemical function.
[Emphasis added.]
80%? That is a lot (see [2] for details). It doesn't throw out the idea of junk-DNA, i.e., that there is DNA that has no function - but it puts the number much closer to zero than the 90% that I have heard before. But I seriously wonder what is meant by "function". Take a look at this image[1]:

 Gene regulation is a very spatial thing, which means that if you were to move a gene (i.e., the protein-coding DNA, or exons) somewhere else, then if would probably not be transcribed at the right time. So, if you were to cut out a length of DNA that doesn't have any function, then other DNA will be shifted spatially, and this might screw up proper transcription. So, DNA without function might be important as a filler. On the other hand, ENCODE includes in the 80% everything that is transcribed (i.e., DNA is used to produce RNA), but that doesn't mean that it has a function, as defined in my book. RNA may be floating around in the cell, and may never be translated (into protein), and may not have any other (e.g. regulatory) function either. On top of that, ever if it is translated and a protein is created based on that DNA, it doesn't necessarily follow that the protein does anything (could even be detrimental to the organism), and then that surely isn't functional.

To me, this is one of those moments where my understanding of how things work is challenged. If it really is true that no more than 20% of the human genome is junk (and it apparently could be a lot less than that), then I am happy to update my understanding, but I am super-skeptical that there is that little junk in the human genome. But I am not too happy with the usage of the words junk and non-functional here.

[1] Joseph R. Ecker, Wendy A. Bickmore, Inês Barroso, Jonathan K. Pritchard, Yoav Gilad & Eran Segal (2012). Genomics: ENCODE explained Nature, 489 DOI: 10.1038/489052a
[2] The ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome Nature, 489 DOI: 10.1038/nature11247


  1. One issue, in addition to the fact that transcriptional artifacts are common, is that 'the human genome' as discussed in these papers refers to the parts of the human genome that are sequenced. Vast amounts of the repetitive dead viral BS was not sequenced and is not included in these analyses.

  2. "DNa without function might be important as a filler."

    Agreed - and on the losest of all possible definitions of function, junk-DNA that does not even fill space between a gene and its regulating site would still carry the function of physically connecting the other bits. The term function needs specification.

  3. Lorax, how much of the genome is not included?

  4. In mathematics, a function is a relation between a set of inputs and a range of outputs where each specific input is related to exactly one output.
    Example : f(x) = (x - x)/2
    Every piece of the function f is important and serves a purpose, but, the expression on the right can be simplified.
    Example: g(x) = 0
    Is Ostman complaining that since DNA might be simplified then not all of what is there has a real purpose?

  5. 1) I don't think the sharing of the word 'function' with mathematics is relevant here.
    2) I am saying that if there is DNA that could be randomized, such that it doesn't matter whether the sequence is ATTGCTTGGA or TCGTCGCTGA, then it does not serve a function the way we usually think about function (in biology).

  6. Having looked over more of the work, I am wrong that most of the repetitive DNA was left out. So I will remove that critique but stick with the artifact critique. The ENCODE people (those pushing the BS) consider every blip on the screen as purposeful and of deep biological meaning.

    ENCODE is a huge data resource, but this stuff just makes me weep for science.

  7. Thanks for the update. Also read this post by one of the main ENCODE authors on Cryptogenomicon.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS