Interpreting neural networks for biological sequences by learning stochastic masks

  • Alipanahi, B., Delong, A., Weirauch, M. & Frey, B. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    Article  Google Scholar 

  • Avsec, Z. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).

    Article  Google Scholar 

  • Eraslan, G., Avsec, Z., Gagneur, J. & Theis, F. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

    Article  Google Scholar 

  • Movva, R. et al. Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PLoS ONE 14, e0218073 (2019).

    Article  Google Scholar 

  • Zhou, J. & Troyanskaya, O. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  Google Scholar 

  • Arefeen, A., Xiao, X. & Jiang, T. DeepPASTA: deep neural network based polyadenylation site analysis. Bioinformatics 35, 4577–4585 (2019).

    Article  Google Scholar 

  • Bogard, N., Linder, J., Rosenberg, A. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell 178, 91–106 (2019).

    Article  Google Scholar 

  • Cheng, J. et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48 (2019).

    Article  Google Scholar 

  • Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).

    Article  Google Scholar 

  • Sample, P. et al. Human 5’ UTR design and variant effect prediction from a massively parallel translation assay. Nat. Biotechnol. 37, 803–809 (2019).

    Article  Google Scholar 

  • Senior, A. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    Article  Google Scholar 

  • Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).

    Article  Google Scholar 

  • Talukder, A., Barham, C., Li, X. & Hu, H. Interpretation of deep learning in genomics and epigenomics. Brief. Bioinform. 22, bbaa177 (2020).

  • Lanchantin, J., Singh, R., Wang, B. & Qi, Y. Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks. In 2017 Pacific Symposium on Biocomputing 254–265 (2017); https://doi.org/10.1142/9789813207813_0025

  • Schreiber, J., Lu, Y. & Noble, W. Ledidi: designing genome edits that induce functional activity. Preprint at bioRxiv https://doi.org/10.1101/2020.05.21.109686 (2020).

  • Norn, C. et al. Protein sequence design by conformational landscape optimization. Proc. Natl Acad. Sci. USA 118, e2017228118 (2021).

  • Kelley, D., Snoek, J. & Rinn, J. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    Article  Google Scholar 

  • Zeng, W., Wu, M. & Jiang, R. Prediction of enhancer–promoter interactions via natural language processing. BMC Genomics 19, 13–22 (2018).

    Article  Google Scholar 

  • Kelley, D. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).

    Article  Google Scholar 

  • Zeng, W., Wang, Y. & Jiang, R. Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network. Bioinformatics 36, 496–503 (2020).

    Article  Google Scholar 

  • Singh, S., Yang, Y., Póczos, B. & Ma, J. Predicting enhancer–promoter interaction from genomic sequence with deep neural networks. Quant. Biol. 7, 122–137 (2019).

    Article  Google Scholar 

  • Calvo, S., Pagliarini, D. & Mootha, V. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc. Natl Acad. Sci. USA 106, 7507–7512 (2009).

    Article  Google Scholar 

  • Araujo, P. et al. Before it gets started: regulating translation at the 5’ UTR. Comp. Funct. Genomics https://doi.org/10.1155/2012/475731 (2012).

  • Whiffin, N. et al. Characterising the loss-of-function impact of 5′ untranslated region variants in 15,708 individuals. Nat. Commun. 11, 2523 (2020).

    Article  Google Scholar 

  • Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at https://arxiv.org/abs/1312.6034 (2013).

  • Zeiler, M. & Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision 818–833 (Springer, 2014); https://doi.org/10.1007/978-3-319-10590-1_53

  • Springenberg, J., Dosovitskiy, A., Brox, T. & Riedmiller, M. Striving for simplicity: the all convolutional net. Preprint at https://arxiv.org/abs/1412.6806 (2014).

  • Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning 3319–3328 (PMLR, 2017).

  • Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning 3145–3153 (PMLR 2017).

  • Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (NIPS, 2017).

  • Singh, M., Ribeiro, S. & Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2018).

  • Fong, R. & Vedaldi, A. Interpretable explanations of black boxes by meaningful perturbation. In 2017 IEEE International Conference on Computer Vision 3449–3457 (IEEE, 2017); https://doi.org/10.1109/ICCV.2017.371

  • Fong, R., Patrick, M. & Vedaldi, A. Understanding deep networks via extremal perturbations and smooth masks. In 2019 IEEE/CVF International Conference on Computer Vision 2950–2958 (IEEE, CVF, 2019); https://doi.org/10.1109/ICCV.2019.00304

  • Dabkowski, P. & Gal, Y. Real time image saliency for black box classifiers. Preprint at https://arxiv.org/abs/1705.07857 (2017).

  • Chen, J., Song, L., Wainwright, M. & Jordan, M. Learning to explain: an information-theoretic perspective on model interpretation. In International Conference on Machine Learning 883–892 (PMLR, 2018).

  • Yoon, J., Jordon, J. & van der Schaar, M. INVASE: instance-wise variable selection using neural networks. In International Conference on Learning Representations (ICLR, 2018).

  • Chang, C., Creager, E., Goldenberg, A. & Duvenaud, D. Explaining image classifiers by counterfactual generation. Preprint at https://arxiv.org/abs/1807.08024 (2018).

  • Zintgraf, L., Cohen, T., Adel, T. & Welling, M. Visualizing deep neural network decisions: prediction difference analysis. In 2018 International Conference on Learning Representations. Preprint at https://arxiv.org/abs/1702.04595 (2017).

  • Carter, B., Mueller, J., Jain, S. & Gifford, D. What made you do this? Understanding black-box decisions with sufficient input subsets. In Proc. 22nd International Conference on Artificial Intelligence and Statistics 567–576 (AISTATS, 2019).

  • Carter, B. et al. Critiquing protein family classification models using sufficient input subsets. J Comput. Biol. 27, 1219–1231 (2020).

  • Covert, I., Lundberg, S. & Lee, S.-I. Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research 22, 1-90 (2021).

  • He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  • Chung, J., Ahn, S. & Bengio, Y. Hierarchical multiscale recurrent neural networks. Preprint at https://arxiv.org/abs/1609.01704 (2016).

  • Jang, E., Gu, S. & Poole, B. Categorical reparameterization with gumbel-softmax. Preprint at https://arxiv.org/abs/1611.0114 (2016).

  • Ancona, M., Ceolini, E., Öztireli, C. & Gross, M. Towards better understanding of gradient-based attribution methods for deep neural networks. In Workshop at International Conference on Learning Representations. Preprint at https://arxiv.org/abs/1711.06104 (2018).

  • Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

  • Giammartino, D. D., Nishida, K. & Manley, J. Mechanisms and consequences of alternative polyadenylation. Mol. Cell 43, 853–866 (2011).

  • Shi, Y. Alternative polyadenylation: new insights from global analyses. RNA 18, 2105–2117 (2012).

    Article  Google Scholar 

  • Elkon, R., Ugalde, A. & Agami, R. Alternative cleavage and polyadenylation: extent, regulation and function. Nat. Rev. Genet. 14, 496–506 (2013).

    Article  Google Scholar 

  • Tian, B. & Manley, J. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 18, 18–30 (2017).

    Article  Google Scholar 

  • Li, Z. et al. DeeReCT-APA: prediction of alternative polyadenylation site usage through deep learning. Genomics Proteomics Bioinformatics https://doi.org/10.1016/j.gpb.2020.05.004 (2021).

  • Wylenzek, M., Geisen, C., Stapenhorst, L., Wielckens, K. & Klingler, K. A novel point mutation in the 3′ region of the prothrombin gene at position 20221 in a lebanese/syrian family. Thromb. Haemost. 85, 943–944 (2001).

    Article  Google Scholar 

  • Danckwardt, S. et al. The prothrombin 3′ end formation signal reveals a unique architecture that is sensitive to thrombophilic gain-of-function mutations. Blood 104, 428–435 (2004).

    Article  Google Scholar 

  • Takagaki, Y. & Manley, J. RNA recognition by the human polyadenylation factor CstF. Mol. Cell. Biol. 17, 3907–3914 (1997).

    Article  Google Scholar 

  • Stacey, S. et al. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 43, 1098–1103 (2011).

    Article  Google Scholar 

  • Medina-Trillo, C. et al. Rare foxc1 variants in congenital glaucoma: identification of translation regulatory sequences. Eur. J. Hum. Genet. 24, 672–680 (2016).

    Article  Google Scholar 

  • Altay, C. et al. A mild thalassemia major resulting from a compound heterozygosity for the IVS-11-1 (G → A) mutation and the rare T → C mutation at the polyadenylation site. Hemoglobin 15, 327–330 (1991).

    Article  Google Scholar 

  • Garin, I. et al. Recessive mutations in the ins gene result in neonatal diabetes through reduced insulin biosynthesis. Proc. Natl Acad. Sci. USA 107, 3105–3110 (2010).

    Article  Google Scholar 

  • Maguire, J., Boyken, S., Baker, D. & Kuhlman, B. Rapid sampling of hydrogen bond networks for computational protein design. J. Chem. Theory Comput. 14, 2751–2760 (2018).

    Article  Google Scholar 

  • Chen, Z. et al. Programmable design of orthogonal protein heterodimers. Nature 565, 106–111 (2019).

    Article  Google Scholar 

  • Ford, A., Weitzner, B. & Bahl, C. Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation. Protein Sci. 29, 43–51 (2020).

  • Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

  • Alford, R. et al. The rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

    Article  Google Scholar 

  • Parrini, C. et al. Glycine residues appear to be evolutionarily conserved for their ability to inhibit aggregation. Structure 13, 1143–1151 (2005).

    Article  Google Scholar 

  • Krieger, F., Möglich, A. & Kiefhaber, T. Effect of proline and glycine residues on dynamics and barriers of loop formation in polypeptide chains. J. Am. Chem. Soc. 127, 3346–3352 (2005).

    Article  Google Scholar 

  • Linder, J. & Seelig, G. Fast activation maximization for molecular sequence design. BMC Bioinform. 22, 1–20 (2021).

    Article  Google Scholar 

  • Chaudhury, S., Lyskov, S. & Gray, J. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26, 689–691 (2010).

    Article  Google Scholar 

  • Linder, J. et al. johli/scrambler: v1.0.0. Zenodo https://doi.org/10.5281/zenodo.5676173 (2021).

  • (Visited 1 times, 1 visits today)