The human genome is finally finished!


When the human genome was first deemed “complete” in 2000, the news was greeted with international fanfare. The two rival groups vying to complete the genome first – one a large government-led consortium, the other a neglected private company – have agreed to declare a joint success. They shook hands with the White House. Bill Clinton chaired. Tony Blair came from London. “We are at an extraordinary moment in scientific history,” said one eminent scientist when publishing these genomes. “It’s like we’ve climbed to the top of the Himalayas.

But in reality, the human genome was not complete. Neither group had reached the true peak. As even contemporary coverage acknowledged, this version was more of a rough draft, riddled with long stretches where the DNA sequence was still fuzzy or missing. The private company quickly pivoted and ended its human genome project, although scientists in the public consortium persevered. In 2003, with less pomp but still plenty of headlines, the human genome was once again declared complete.

But in fact, the human genome was always incomplete. Even the revised draft was missing about 8 percent of the genome. These were the most difficult regions to sequence, full of repetitive letters that were simply impossible to read with technology at the time.

Finally, in May, a separate group of scientists quietly posted online a prepublication describing what can be considered the first truly complete human genome – a read of all 3.055 billion letters on 23 human chromosomes. The group, led by relatively young researchers, gathered on Slack from around the world to complete the task abandoned 20 years ago. There was no big announcement from the White House this time around, no discussion of the Himalayan summit; the article itself is still under review for official publication in a journal. But the lack of pomp belies what a feat: To complete the human genome, these scientists had to figure out how to map its most mysterious and neglected repeating regions, which can now finally get their scientific due.

“I see this as a landmark,” says Steven Henikoff, molecular biologist at the Fred Hutchinson Cancer Research Center, who was not involved in the project. Henikoff is studying one of those enigmatic and difficult-to-sequence regions where previous human genome projects had abandoned: the centromeres, which are the slightly pinched midpoints of each chromosome. Chromosomes, of which humans have 23 pairs, each consist of a long, continuous stretch of DNA that can be condensed into a rod shape; DNA at the centromere is particularly dense.

On five human chromosomes, the centromere is not in the middle but very close to one end, dividing the chromosome into a long arm and a very short arm. These short arms are also packed with reps that had never been fully sequenced until now. Centromeres, short arms, and other types of repeating regions made up most of the 238 million letters that the consortium ultimately added or corrected in the human genome.

The repeat-rich segments of the human genome usually do not contain genes, which is one reason they have long been overlooked. Geneticists have largely focused on genes because their function is obvious and simple: a gene encodes a protein. (A big surprise to early versions of the human genome is how little our DNA actually encodes for proteins – only 1 percent. The role of the remaining 99 percent becomes clearer.) Indeed, there have been indications that these Regions rich in repeats also play an important role in the way genes are expressed and transmitted, and their abnormalities have been linked to cancer and aging. The consortium also found 79 new genes hidden among the repeats. With a map of these repeating regions finally in hand, scientists can more carefully probe their function.

The effort to complete the genome was “entirely local,” says Adam Phillippy, a computer geneticist at the National Institutes of Health who co-heads the Telomere-to-Telomere (T2T) consortium that completed the genome. (Telomeres are the regions at the ends of chromosomes, so telomere to telomere means “end to end.”) Phillippy and Karen Miga, geneticist at UC Santa Cruz, decided to form the consortium in 2018, after a call when ‘they realized that they both had an ambition to complete the human genome.

“I’m in love with the rehearsals,” says Miga, who came to the project as a biologist trying to figure out what those rehearsals do. Phillippy, a computer scientist by training, provided technical knowledge. Traditional sequencing technologies shatter DNA into small pieces, and computer algorithms have to put them back together like puzzle pieces. The problem is that the parts of the repeated regions all look the same. Now, two new ‘long read’ sequencing technologies, called PacBio HiFi and Oxford Nanopore, allow scientists to read longer portions of the genome. These sequencers still can’t handle pieces big enough to go through an entire centromere or a short arm, but at least the algorithms have bigger puzzle pieces to put together.

The role of centromeric sequences, like many other repeating regions, is not yet fully understood, but they are more classically known as the key to cell division. When a cell divides into two, a protein spindle attaches to the centromeres, separating the chromosomes to ensure that each cell is given the correct number. When this goes wrong in the eggs or sperm, babies can be born with chromosomal abnormalities such as Down syndrome or Turner syndrome. When it goes wrong in other parts of the body, we can end up with blood cells, for example, that have too many or too few chromosomes. It’s a hallmark of aging: it’s not uncommon for men over 70 to lose Y chromosomes in their blood cells. In one of two companion papers uploaded with the full genome, the T2T consortium showed that Oxford Nanopore’s long read technology can also be used to map where exactly the protein spindle attaches to the centromere. Examination of the sequences in these regions could provide new clues about chromosomal abnormalities.

Equally mysterious are the short arms rich in repeating chromosomes. They certainly play a role in the cellular machinery that translates genes into proteins, and knowing their sequences could shed more light on this function. Brian McStay, a biologist at the National University of Ireland in Galway, likens the entire genome to a “parts list” for chromosomes that allows scientists to try and remove building blocks one by one. “By knowing what this parts list is, we can say, ‘This is exactly what our chromosome looks like,’” says McStay. “” Let’s take this out and see what the impact is on the function of this chromosome. “

As impressive as the technical feat of sequencing a complete human genome is, scientists have told me that a genome is just a snapshot. It will be much more interesting to see how these repeating regions change over time from person to person, from species to species. “What happens in cancer? What is happening in development? What happens if you compare the offspring to the parents? Said Henikoff. The consortium has proven that these repetitive regions can be sequenced with new long read technologies. Now they can be applied to more genomes, allowing scientists to compare them to each other.

Indeed, Miga says the ultimate dream is to make every genome that scientists try to sequence be complete end-to-end, telomere to telomere. But first, the group has a more immediate goal in mind. If you wanted to blame the new genome for being “incomplete,” you could point out that it only has one set of 23 chromosomes, whereas normal human cells have 23 pairs. To simplify the task, the group used cells from a particular type of tumor that develops from an abnormal fertilized egg and ends with only 23 unique chromosomes. The team will need to use different cells, with 23 pairs of chromosomes, to complete what is called a “diploid” genome.

“The next big step would be routine diploid genomes,” says Shilpa Garg, geneticist at the University of Copenhagen, Denmark. Garg used PacBio HiFi to quickly assemble human genomes – minus some delicate regions such as centromeres – at the rate of a few per day. This speed could also help in clinical settings, allowing physicians to diagnose patients more regularly using genome sequencing. (In comparison, she says, assembling genomes from older sequencing technology takes up to three weeks.) Truly the whole genome sequencing, repeating regions and all, is getting easier and faster. Soon, another complete human genome will not be new at all.

Source link


Leave A Reply