Cerebras Announces New NLP Model Training Capability


SUNNYVALE, Calif., August 31, 2022 — Cerebras Systems, a pioneer in artificial intelligence (AI) computational acceleration, today launched another industry-first capability. Customers can now quickly train Transformer-like natural language AI models with sequences 20 times longer than what is possible with traditional computer hardware. This new capability should lead to breakthroughs in natural language processing (NLP). By providing much more context to understanding a given word, phrase, or DNA strand, the long sequence length capability allows NLP models to have much finer understanding and better predictive accuracy.

“Earlier this year, the Cerebras CS-2 set the record for training the largest natural language processing (NLP) models with up to 20 billion parameters on a single device,” said Andrew Feldman, CEO and co-founder of Cerebras Systems. “We now allow our customers to train with longer sequences on the largest NLP models. This delivers previously unobtainable precision, opening up a new world of innovation and possibilities through AI and deep learning.

Language is context-specific. This is why word-for-word translation with a dictionary fails – without context, the meaning of words is often vague. In language, a word is best understood in the context of surrounding words, which provide guides to understanding meaning. This is also true in AI. Long sequence lengths allow an NLP model to understand a given word, in an increasingly larger context.

Imagine hearing the phrase “to be or not to be” without context, just using a dictionary. And then imagine understanding it in the context of Act II, Scene 1 of Hamlet. And then imagine if you had a larger context and could understand it in the context of the whole play – or better yet, in the context of all Shakespearean literature. As the context in which understanding occurs expands, the precision of understanding also expands. By greatly expanding the context (the sequence of words in which the target word is understood), Cerebras enables NLP models to demonstrate a more sophisticated understanding of language. A larger and more sophisticated context improves the accuracy of understanding in AI.

While many industries will benefit from this new capability, Cerebras’ pharmaceutical and life science customers are particularly excited about the implications for their drug discovery efforts. DNA is the language of life, and DNA analysis has been a particularly powerful application of large language models.

“Machine learning at GSK involves taking complex data sets generated at scale and answering very difficult biological questions,” said Kim Branson, senior vice president and global head of AI and machine learning at GSK. “The long sequence length capability allows us to look at a particular gene in the context of tens of thousands of surrounding genes. We know that surrounding genes impact gene expression, but we’ve never been able to explore that within AI. »

The proliferation of NLP has been propelled by the exceptional performance of Transformer-like networks such as BERT and GPT. However, these models are extremely computationally intensive. Even when trained on massive clusters of graphics processing units (GPUs), these models today can only process sequences up to about 2,500 tokens in length. Tokens can be words in a document, amino acids in a protein, or base pairs on a chromosome. But an eight-page document could easily exceed 8,000 words, meaning an AI model attempting to summarize a long document would lack a full understanding of the topic. The unique Cerebras wafer-scale architecture overcomes this fundamental limitation and enables sequences up to a hitherto impossible length of 50,000 tokens.

This innovation opens up previously unexplored frontiers of deep learning. Even within traditional language processing, there are many examples of tasks where this kind of extended context is important. Recent work has shown that for tasks such as evaluating ICU patient discharge data and analyzing legal documents, it is important to see the entire document to understand. These documents can contain tens of thousands of words. The potential applications beyond language are even more exciting. For example, research has shown that protein structures are highly dependent on long-range interactions between building blocks, and that training models with longer sequence lengths are likely to perform better. Now that the Cerebras CS-2 system makes long-sequence training not only possible, but easy, researchers are sure to discover many more applications and solve problems previously thought to be intractable.

Training large models with massive data sets and long run lengths is an area where the Cerebras CS-2 system, powered by the Wafer-Scale Engine (WSE-2), excels. The WSE-2 is the largest processor ever built. It’s 56 times larger, has 2.55 trillion more transistors, and 100 times more compute cores than the largest GPU. This scale means the WSE-2 has both the memory to store the largest layer calculations for the largest models and the computing power to quickly process such huge calculations. In contrast, similar workloads on GPUs need to be parallelized across hundreds or thousands of nodes to train a model in a reasonable amount of time. This type of GPU infrastructure requires specialized expertise and valuable engineering time to set up. Meanwhile, the Cerebras CS-2 system can perform similar workloads at the touch of a button, eliminating complexity while accelerating analysis time.

For more information on Cerebras systems, please visit the Cerebras blog.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to build a new class of computing systems, designed with the sole purpose of accelerating AI and changing the future of AI work forever. Our flagship product, the CS-2 system, which is powered by the world’s largest processor – the 850,000-core Cerebras WSE-2, enables customers to accelerate their deep learning work by orders of magnitude over graphics processing units.

Source: Cerebras Systems


Comments are closed.