To solve the most pressing scientific problems, scientists today often face enormous obstacles when it comes to collecting the data they need to embark on research.
Enter Ramkumar Hariharan, data scientist and computational biologist at Northeastern University in Seattle. A scientist and engineer, Hariharan’s current research centers on an emerging field of science called geroscience, or “the study of aging as it relates to age-related disease.” Lately, Hariharan wants to understand why some cancer patients respond better to certain types of immunotherapies.
To do this requires a lot of information about the patients themselves, the specific forms of cancer and the drugs used to treat the patients. Naturally, there is a lot of data to process and from various sources. All of this information requires sorting or cleaning, scraping (exporting data from one source or program to another), and “derivation” (combining or processing raw data into new information).
“The first part is to build AI systems and pipelines,” says Hariharan. “And why do we do it? We want to solve scientific problems.
Hariharan and a team of Northeast researchers received a grant to build an “end-to-end autoML pipeline” to help predict patient response to cancer immunotherapies. The Automated Machine Learning (autoML) model uses so-called “deep learning”, a form of artificial intelligence modeled from human decision-making, to help researchers sift through huge amounts of raw data.
Specifically, the researchers are looking to see if they can prospectively identify patients who would benefit most from these different treatments and, in doing so, isolate the individual factors that make patients more or less responsive to them. These may include factors such as the patient’s age, physical characteristics, and general state of health, among others.
The goal is to look for patterns in the available data (i.e. data accessible in the published literature and other public databases) that help researchers build a clinical picture of how patients could get away with it during treatment.
To be as specific as possible, researchers need more than just a patient’s age, gender, and health; they need other more specific data points, such as the cellular composition of cancerous tumors and molecular measurements that provide insight into gene activity or expression.
A problem for researchers looking to retrieve this specialized data is that much of it is so-called domain-specific knowledge, meaning it is overseen by experts – here, medical professionals. and health – and locked away in disparate and poorly organized databases. . Another challenge is the extensive hand coding required to accurately calibrate many existing machine learning models.
This is where autoML comes in. Unlike traditional machine learning models that require trained experts to manually change the parameters of an algorithm, autoML is an approach in which the system is built to learn how to optimize its dozens of “hyperparameters and push buttons” . on his own, Hariharan said.
“The autoML pipeline supports two things: first, you are much less dependent on domain experts, and second, your machine learning workflow is dramatically accelerated,” he says. “You don’t need to create additional derived data and append it to existing data, because it can identify relevant new derived data on its own.”
Hariharan’s team recently completed building the autoML pipeline and is currently in the process of refining the system and measuring its performance against classical and practical models. Funding of $50,000 for the project comes from Northeastern’s Institute for Experiential AI. Rohit Gandikota, Alekya Kasturi, Shreyangi Prasad and Ayesha Mathur, all based in the North East, contributed to the research.
Hariharan says the complicated data project has been spurred by developments in geroscience and a broader shift in how scientists understand aging. As you age, your bodily functions begin to slow down. “Things are starting to fall apart,” Hariharan says. This in turn predisposes a person to a host of age-related diseases.
“Your likelihood of getting cancer increases dramatically,” Hariharan says. “Yes, young people have cancer, but they are more like the outlier cases. And age is not the only factor that causes cancer, Alzehmiers disease or cardiovascular disease.
It also depends on your genetic heritage, he says, and the “epigenetic marks” that “lie on top of your DNA.” These marks are chemical changes to the letters in DNA, Hariharan says, that can offer clues to how we age. Diet and lifestyle, long believed to influence how quickly we age, can also have an impact on the formation of these marks.
“There are so many ways to measure your biological age,” he says. “Looking at the epigenomic model is one way to do that.”
Other so-called biomarkers of aging vary and can include, for example, the speed at which a person walks, their grip strength and other blood measurements, such as how they respond to glucose. As scientists’ understanding of the mechanisms of aging evolves, more potential data points are emerging as variables and determinants of health, Hariharan says.
Machine learning, he says, will be the key to unlocking this data.
“We want to build AI-powered computational tools to find more reproducible ways to measure biological aging,” Hariharan says. “We haven’t started this research yet, but we will be launching it soon.”
For media inquiries, please contact [email protected]