New machine learning tool discovers mutational signature linking bladder cancer to smoking

Researchers at the University of California at San Diego have discovered for the first time a pattern of DNA mutations that links bladder cancer to tobacco smoking. The discovery was made possible thanks to a powerful new machine learning tool the team developed to find patterns of mutations caused by carcinogens and other DNA-altering processes.

The work, published September 23 in Cell Genomicscould help researchers identify which environmental factors, such as exposure to tobacco smoke and UV radiation, cause cancer in certain patients.

Each of these environmental exposures changes DNA in a unique way, generating a specific pattern of mutations called a mutational signature. If a signature is found in the DNA of a patient’s cancer cells, the cancer can be traced back to the exposure that created that signature. Knowing which mutational signatures are present can also lead to more customized treatments for a patient’s specific cancer.

In this study, researchers found a mutational signature in bladder cancer DNA that is linked to tobacco smoking. The finding is significant because a tobacco smoking mutation signature has been detected in lung cancer, but not yet in bladder cancer.

There is strong epidemiological evidence that bladder cancer is associated with tobacco smoking. We even see a specific mutational signature in other tissues – such as the mouth, esophagus and lungs – that are directly exposed to cancer-causing substances in tobacco. The fact that we didn’t find this signature in the bladder was strange.”


Ludmil Alexandrov, senior study author, professor of bioengineering and cellular and molecular medicine, UC San Diego

Alexandrov and colleagues now show that there is a mutational signature of tobacco smoking in bladder cancer, and it is different from the signature found in lung cancer. In addition, they show that this signature is also found in normal bladder tissues of tobacco smokers who have not developed bladder cancer. The signature was not found in the bladder tissues of non-smokers.

“What this signature tells us is that certain mutations in your DNA are due to exposure to tobacco smoke,” said co-first author Marcos Diaz-Gay, a postdoctoral researcher in Alexandrov’s lab. “It doesn’t necessarily mean you have cancer, but the more you smoke, the more mutations build up in your cells and the more you are at risk of developing cancer.”

Powered by next-generation machine learning

The researchers found the tobacco signature using a next-generation machine learning tool developed by Alexandrov’s lab. The team says it is the most advanced, automated bioinformatics tool to extract mutation signatures directly from large amounts of genetic data.

“This is a powerful machine learning approach to recognize and separate mutation patterns from genomic data,” says Alexandrov. “It takes those patterns and deciphers them so we can see what the mutation signatures are and match them with their meaning.”

He compared the machine learning approach to picking out individual conversations at a cocktail party.

“You have multiple groups of people around you talking, and you’re only interested in hearing certain individuals,” he said. “Our tool essentially helps you do that, but with cancer genetic data. You’ve exposed multiple people around the world to different mutagens in the environment, and some of those exposures leave marks on their genomes. This tool goes through all that data to pick out what are the processes that cause the mutations.”

The tool was used to analyze 23,827 human cancers. It found four mutational signatures — including those in bladder cancer related to tobacco smoking — that had not been detected by any other tool. The three other signatures, found in stomach, colon and liver cancers, still warrant further research to see what processes triggered them.

To show how powerful their tool is, the researchers tested it with 13 existing bioinformatics tools. The tools were evaluated for their ability to extract mutational signatures from more than 80,000 synthetic cancer samples. The tool that Alexandrov’s team developed outperformed all the others. It detected 20 to 50% more true positive signatures, with five times fewer false positive signatures. It even performed well when analyzing noisy data, while the other tools failed.

“In bioinformatics, this is the first time such extensive benchmarking has been performed at this scale for extracting mutational signatures,” Diaz-Gay said. “It’s a huge undertaking to compare many tools in many datasets.”

Such an achievement is also precious, Alexandrov noted. “Thanks to funding from Cancer Research UK, we were able to conduct this technical, comprehensive evaluation, which is not often done.”

Create a more user-friendly and personalized tool

The team’s ultimate goal is to create a web-based tool that more researchers can use and, as a result, profile more patients.

“Right now, this tool requires bioinformatics expertise to use it,” Alexandrov said. “What we want is to create an easy-to-use version on the web where researchers can just enter a patient’s mutations, and it immediately gives you the set of mutation signatures and what processes caused them.”

“Our idea for the future is to use this tool to analyze patients on an individual level,” said Diaz-Gay.

Source:

University of California – San Diego

Reference magazine:

Ashiqul Islam, SM, et al. (2022) Discovering new mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics. doi.org/10.1016/j.xgen.2022.100179.

Leave a Comment