Message from the Director

Deciphering the Genome Code

Hunt Willard portrait Huntington F. Willard, PhD

How do you read a book when you can only recognize 1 to 2% of the words? Welcome to the human genome! With the essentially complete sequence in hand, it is becoming increasingly clear that the number of protein-coding genes in the human genome is not likely to exceed 25,000 and that the proportion of the genome devoted to encoding these genes is somewhere between 1 and 2%. So, what about the remaining 98 to 99%?

"Could there be an entirely different code buried in the genome, thus far overlooked or missed in the excitement over deciphering the original genetic code and finding the genes?"

Now that all the self-congratulatory hoopla and back-slapping is over, we are left with a book of 3 billion letters, written with a digital alphabet we know (G's, A's, T's and C's) and packaged into 24 nice chapters, but written in a language whose vocabulary is incomplete and whose rules of syntax and grammar are unknown and likely unprecedented. And there's no recognizable punctuation or spacing. How do you learn to read an ancient foreign language like this? Left to right? Right to left? Sometimes both? Do we read every fifth word? Or skip entire pages in the book? What was this book's author—evolution, working night and day for a billion years or so—thinking? Or was she like Richard Dawkins' "blind watchmaker" or the apocryphal chimpanzee at the keyboard, randomly pounding out an occasional masterpiece?

Let's jump back two hundred years and consider the Rosetta Stone, discovered in Egypt in 1799. Written in three different scripts, one of which was unrecognizable to scholars of the day, it took more than 20 years before historians learned how to read the strange hieroglyphs. It was a young French scholar, Jean Francois Champollion, who eventually deciphered the hieroglyphs on the strength of two important, but not obviously connected, hypotheses. The first was that the symbols could either be ideograms (representing concepts or words) or phonograms (representing sounds). What are the parallels in the human genome? We know some of the ideograms—genes, codons, exons, splicing signals, promoters. But do we also have the equivalent of phonograms, combinations of bases that have a very different meaning? Could there be an entirely different code buried in the genome, thus far overlooked or missed in the excitement over deciphering the original genetic code and finding the genes? One of Champollion's discoveries was that, depending on the context, hieroglyphs can be read right to left or top to bottom and, occasionally for effect, left to right! Again, there are clear parallels to our genome. What are the clues that tell the cell (or us) in what direction to read the code? And are there non-linear elements, perhaps quite long distances from each other along the chromosome, that are the equivalent of hieroglyphs read up and down, rather than side to side?

Champollion's second important hypothesis, critical to his eventual breaking of the code, was that the hieroglyphs he was studying were an ancient form of the later Egyptian Coptic script that he knew well. In other words, they were evolutionarily related and he could look for conserved or similar elements in the two scripts to eventually decipher the meaning behind the ancient code.

Now let's step back another 100 million years, to a time when two currently quite distinctive genomes, those of the budding yeast Saccharomyces cerevisiae and the filamentous fungus Ashbya gossypii, belonged to a common ancestral genome prior to their evolutionary divergence. How can we understand the ancient events that shaped these two genomes or hope to recognize the genomic fossils of the ancestral genome? In short, how can we learn to read these genomes? And how might this help us learn how to read our own genome?

As this month's lead article illustrates (and as visitors to Duke like genome luminaries Eric Green, Eric Lander and Sir John Sulston have shown over and over), when it comes to genomes, there is a lot to be gained by looking beyond the human. In their April 9 Science paper describing the Ashbyasequence, the Center for Genome Technology's Fred Dietrich and colleagues achieve two milestones. First, they reveal the evolutionary significance of this organism and its relationship to yeast. This is another crucial piece of the puzzle of how genomes evolve over time, and how they add and subtract specific genes and other encoded elements. Why fungi? Like See Spot Run, we have to learn to read shorter books, with their simpler vocabularies and rules of grammar, before having a hope of tackling the entire human genome. Second, genomes are not just for pleasure reading. Dietrich and his co-authors in academia and biotech have provided a blueprint that will allow future investigators to figure out how to attack fungal pathogens, an ongoing and extremely expensive problem in worldwide agriculture.

Who will be the next Champollion? One percent down and only 99 to go. Happy reading!

Huntington F. Willard
Director