As part of the battle to develop a vaccine for COVID-19, figuring out how the virus changes as it spreads is a challenge. It requires a lot of data and a lot of computing power, as every infected patient adds millions of more data points to an already huge database. Yet each bit provides additional information that could point researchers towards the locations of transmission events – if it can be analyzed in a timely manner.
Hamilton’s McMaster University has partnered with the Ontario Vector Institute and Sunnybrook Health Sciences in Toronto on a tool to make that tracking easier. The COVID-19 Genotyping Tool (CGT) uses big data analytics to help researchers worldwide track changes in the virus’s genetic structure as it moves from person to person, providing clues that help them determine where it came from, and to project where it’s headed and whether it’s becoming more infectious.
The donation of a CAD $375,000 Cisco high-performance Unified Computing System (UCS) by the Cisco Foundation is bolstering the effort.
“A key element of this research is the COVID-19 Genotyping Tool (CGT), an artificial intelligence/machine learning analytics platform that allows researchers, hospitals and public agencies around the globe to upload their COVID-19 data and contextualize it with available sources in the public domain,” explained Rob Barton, distinguished systems engineer at Cisco Canada in a blog post. “Using AI dimensionality reduction techniques such as UMAP, the CGT is able to identify small differences in the virus genome, allowing it to be classified and compared against other known strains.”
But these analyses are not getting any easier as time goes on, noted Dr. Andrew McArthur, associate professor of biochemistry and biomedical science at McMaster and past Cisco Chair in Bioinformatics, in an interview.
“The biggest challenge in genomic data is that capacity is increasing exponentially,” he said. “And during a pandemic when you suddenly want to sequence every positive patient, you’re constantly redlining.”
With cloud compute costs spiralling, the effort was not sustainable on a number of fronts. The expense also led to concerns about privacy and security.
“Because some of the data that comes to us is associated with patient information, we needed to have something rock solid in-house so we could protect privacy,” he said. Those factors made Cisco’s donation highly valuable.
“It was a very critical donation, short and long term. It solves a lot of long-term problems as well,” McArthur noted. “We’re expecting after COVID or during COVID we’re going to see a lot of drug-resistant bacterial infections using a lot of antibiotics to keep people alive, and that complex biology, how a virus and a human and a bacterial community interact to make people sick, generates huge amounts of data as we begin with others to get ready for that world and build towards it. And now we actually have to do it. It’s going to be important for now, medium-term, and long term.
“The funding of the Cisco device was really for that big picture – there was a gap between the people who look at the global scale and the people who are on the front lines in your neighbourhood doing the sequencing work,” he went on. “We wanted to take the global scale data, precompute it, and do some really smart machine learning, so people can take their local data and quickly get it in there. And that’s really what that machine was doing the bulk of.”