AI, especially deep learning, was not well appreciated by biologists since it has been considered a “black box” system. However, in recent years, there are several efforts made by researchers to open up this black box by connecting input and output to “explain” the model.
Layer-wise relevance propagation (LRP) is one of those methods that explain deep neural network models (Bach et al., 2015). It was originally developed to identify important pixels in image classification models and calculate relevance scores of the input features (pixels) to the model. This article will walk through how AI Dynamics applied this method to a biological problem, cancer drug target discovery. We applied LRP to a cancer classification model, built on gene expression data, to identify a set of genes important for classifying cancer patients.
We started off with a squamous cancer dataset from TCGA, a joint nationwide effort led by the National Institute of Health, the Cancer Genome Atlas project. The project generated multiple types of omics data from over 20,000 samples across 33 cancer types, which is publicly accessible. First, biopsy gene expression (transcriptomics) data of squamous cancer patients was clustered by k-means clustering to find the natural separation of the patients based on cancer-related gene expressions. A multilayer perceptron network model was constructed to classify those clusters, then LRP was applied to identify genes contributed (highly relevant) to the model. Genes with high relevance scores were compared to the genes identified by conventional differential gene expression analysis (Fig. 1).
The model consisted of hidden layers of 256, 128, 64, and 2 neurons. The validation accuracy of the model was 98% with an area under the curve of 0.98. This high performance is expected, as we used the same data for clustering and modeling. We then looked at the LRP scores of each gene in each patient (Fig. 2). Some of the genes identified by conventional differential expression analysis (marked as “DE” in Fig. 2) showed high LRP scores. This indicates LRP results support some genes considered important previously. In addition, there are genes, not identified by the conventional method, but represented high LRP scores. Those genes are also explaining the model (together with other genes identified by the conventional method), indicating a possible relationship to those patients’ cancer biology.
It is very encouraging to see a set of new genes identified by applying LRP to explain deep learning model. Although we still need to test whether those genes are important in other datasets and to confirm the mechanism and identify drug targets in cell lines and/or animal models, our approach added new insights to generating biological hypotheses to test, which will eventually help to develop tailored treatments for patient groups.
Bach S et al., (2015) On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS One. 10(7):e0130140.
Campbell JD et al., (2018) Genomic, pathway network, and immunologic features distinguishing squamous carcinomas. Cell Rep. 23(1):194-212.e6.