A Johns Hopkins engineer co-led a team that sequenced the genomes of the world’s most widely used model plant species, Arabidopsis thaliana, at a level of detail never seen before. Until now, regions of this genome, including centromeres, the spindles that guide chromosomes when an organism passes rapidly from one to billions of cells, have remained uncharted territory, due to their complex structure. Today, for the first time, researchers have revealed the secrets of Arabidopsis centromeres, shedding light on their evolution and providing insight into a paradox that has mystified scientists for decades. Their results were published on November 12 in Science.
“In this study, we solve the sequence and structure of the centromeres of the most studied plant species for genetic research: the one we use to understand the genetics of rice or corn or wheat or tomatoes and beyond” said Michael Schatz, Bloomberg, distinguished professor of computer science and biology at Johns Hopkins University, who co-led the study with Ian Henderson, head of the Genetic and Epigenetic Inheritance of Plants group at Cambridge University. “And while the research has focused on plants, it certainly has implications for human genetics and for understanding how human cells grow and divide with such precision.” The study also included co-first author Michael Alonge, who recently completed his doctorate in the computer science department at the Johns Hopkins Whiting School of Engineering, working with Schatz.
“The amazing thing is that all higher organisms use this process, including the 10 trillion cells in your body, as well as the quadrillion cells of other plant and animal species.”
Bloomberg Emeritus Professor of Computer Science and Biology
Arabidopsis thaliana was adopted as a model plant due to its short generation time, small size, ease of growth, and prolific seed production by self-pollination. Its rapid life cycle and small genome make it well suited for genetics and mapping of key genes that underlie the traits of interest. The small flowering plant often found by the roadside has led to a multitude of discoveries, and in 2000 it became the first plant to have its genome sequenced, except for its centromeres, telomeres (a structure at the end of a chromosome) and a few other complex regions of the genome.
Since then, new long-read sequencing technologies have advanced, allowing researchers to view the genome in more than 100,000 pieces of nucleotides, instead of 100 to 200 pieces of nucleotides. This is thanks to the introduction of nanopore sequencing, which measures the electric current as nucleic acids pass through a protein nanopore, a hollow structure inserted into a membrane. As DNA passes through the nanopore, different nucleic acid bases change the current in different ways. The resulting electrical signal is then decoded to provide the specific DNA sequence.
These data, combined with the algorithmic advancements that put the readings together, mean solving the “genomic puzzle” is suddenly possible in ways that previously wasn’t possible. Critically, this also means that it is now possible to probe the genetic makeup of the centromere, which had previously proven to be a dead end due to its difficult structure.
“It’s fantastic to be able to see into centromeres for the first time and use it to understand their unusual evolutionary patterns,” said Henderson.
For decades, researchers have attempted to understand the paradox of how and why centromeric DNA evolves with extraordinary speed, while remaining stable enough to do its job during cell division. In contrast, other old parts of the cell tend to grow very slowly. This study, by revealing the genetic and epigenetic topography of Arabidopsis centromeres, marks a radical change in our understanding of this paradox.
“The amazing thing is that all higher organisms use this process, including the 10 trillion cells in your body, as well as the quadrillion cells of other plant and animal species. , even though the function of the centromeres was established and maintained for billions of years, the DNA sequence of the centromeres is extremely variable; it is in fact one of the most variable parts of any genome “, Schatz explains.
The study’s “maps” provide new information about the “repeat ecosystem” found in the centromere, revealing an architecture of repeat networks, which has implications for their evolution and more. The authors’ model reveals that the centromeres evolve via cycles of duplication and diversification of sequences. The research team plans to use these maps as a basis for understanding how and why centromeres evolve so rapidly.