Advertisement

Machine learning solves RNA puzzles

RNA molecules fold into complex three-dimensional shapes that are difficult to determine experimentally or predict computationally. Understanding these structures may aid in the discovery of drugs for currently untreatable diseases. Townshend et al. introduced a machine-learning method that significantly improves prediction of RNA structures (see the Perspective by Weeks). Most other recent advances in deep learning have required a tremendous amount of data for training. The fact that this method succeeds given very little training data suggests that related methods could address unsolved problems in many fields where data are scarce. —DJ

Abstract

RNA molecules adopt three-dimensional structures that are critical to their function and of interest in drug discovery. Few RNA structures are known, however, and predicting them computationally has proven challenging. We introduce a machine learning approach that enables identification of accurate structural models without assumptions about their defining characteristics, despite being trained with only 18 known RNA structures. The resulting scoring function, the Atomic Rotationally Equivariant Scorer (ARES), substantially outperforms previous methods and consistently produces the best results in community-wide blind RNA structure prediction challenges. By learning effectively even from a small amount of data, our approach overcomes a major limitation of standard deep neural networks. Because it uses only atomic coordinates as inputs and incorporates no RNA-specific information, this approach is applicable to diverse problems in structural biology, chemistry, materials science, and beyond.
Get full access to this article

View all available purchase options and get full access to this article.

Already a Subscriber?

Supplementary Materials

This PDF file includes:

Materials and Methods
Figs. S1 to S9
Tables S1 to S5
References (3765)

Other Supplementary Material for this manuscript includes the following:

MDAR Reproducibility Checklist

References and Notes

1
T. R. Cech, J. A. Steitz, The noncoding RNA revolution-trashing old rules to forge new ones. Cell 157, 77–94 (2014).
2
K. D. Warner, C. E. Hajdin, K. M. Weeks, Principles for targeting RNA with drug-like small molecules. Nat. Rev. Drug Discov. 17, 547–558 (2018).
3
A. Churkin, M. D. Retwitzer, V. Reinharz, Y. Ponty, J. Waldispühl, D. Barash, Design of RNAs: Comparing programs for inverse RNA folding. Brief. Bioinform. 19, 350–358 (2018).
4
ENCODE Project Consortium, An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
5
H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourne, The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
6
S. Jain, D. C. Richardson, J. S. Richardson, Computational methods for RNA structure validation and improvement. Methods Enzymol. 558, 181–212 (2015).
7
D. S. Marks, L. J. Colwell, R. Sheridan, T. A. Hopf, A. Pagnani, R. Zecchina, C. Sander, Protein 3D structure computed from evolutionary sequence variation. PLOS ONE 6, e28766 (2011).
8
A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis, Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
9
H. Kamisetty, S. Ovchinnikov, D. Baker, Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci. U.S.A. 110, 15674–15679 (2013).
10
F. Pucci, M. B. Zerihun, E. K. Peter, A. Schug, Evaluating DCA-based method performances for RNA contact prediction by a well-curated data set. RNA 26, 794–802 (2020).
11
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015).
12
Materials and methods are available as supplementary materials.
13
D. E. Worrall, S. J. Garbin, D. Turmukhambetov, G. J. Brostow, “Harmonic networks: Deep translation and rotation equivariance” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 7168–7177.
14
B. Anderson, T. S. Hy, R. Kondor, “Cormorant: Covariant Molecular Neural Networks” in Advances in Neural Information Processing Systems 32 (NeurIPS 2019), H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett, Eds. (Curran Associates, 2019), pp. 14537–14546.
15
M. Weiler, M. Geiger, M. Welling, W. Boomsma, T. S. Cohen, “3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data” in Advances in Neural Information Processing Systems 31 (NeurIPS 2018), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett, Eds. (Curran Associates, 2018), pp. 10381–10392.
16
N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, P. Riley, Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds. arXiv:1802.08219 [cs.LG] (2018).
17
S. Eismann, R. J. L. Townshend, N. Thomas, M. Jagota, B. Jing, R. O. Dror, Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes. Proteins 89, 493–501 (2021).
18
R. Das, D. Baker, Automated de novo prediction of native-like RNA tertiary structures. Proc. Natl. Acad. Sci. U.S.A. 104, 14664–14669 (2007).
19
A. M. Watkins, R. Rangan, R. Das, FARFAR2: Improved De Novo Rosetta Prediction of Complex Global RNA Folds. Structure 28, 963–976.e6 (2020).
20
Z. Miao, R. W. Adamiak, M. Antczak, M. J. Boniecki, J. Bujnicki, S.-J. Chen, C. Y. Cheng, Y. Cheng, F.-C. Chou, R. Das, N. V. Dokholyan, F. Ding, C. Geniesse, Y. Jiang, A. Joshi, A. Krokhotin, M. Magnus, O. Mailhot, F. Major, T. H. Mann, P. Piątkowski, R. Pluta, M. Popenda, J. Sarzynska, L. Sun, M. Szachniuk, S. Tian, J. Wang, J. Wang, A. M. Watkins, J. Wiedemann, Y. Xiao, X. Xu, J. D. Yesselman, D. Zhang, Y. Zhang, Z. Zhang, C. Zhao, P. Zhao, Y. Zhou, T. Zok, A. Żyła, A. Ren, R. T. Batey, B. L. Golden, L. Huang, D. M. Lilley, Y. Liu, D. J. Patel, E. Westhof, RNA-Puzzles Round IV: 3D structure predictions of four ribozymes and two aptamers. RNA 26, 982–995 (2020).
21
E. Capriotti, T. Norambuena, M. A. Marti-Renom, F. Melo, All-atom knowledge-based potential for RNA structure prediction and assessment. Bioinformatics 27, 1086–1093 (2011).
22
J. Wang, Y. Zhao, C. Zhu, Y. Xiao, 3dRNAscore: A distance and torsion angle dependent evaluation function of 3D RNA structures. Nucleic Acids Res. 43, e63 (2015).
23
J. Li, W. Zhu, J. Wang, W. Li, S. Gong, J. Zhang, W. Wang, RNA3DCNN: Local and global quality assessments of RNA 3D structures using 3D deep convolutional neural networks. PLOS Comput. Biol. 14, e1006514 (2018).
24
J. Behler, M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
25
J. Xu, Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci. U.S.A. 116, 16856–16865 (2019).
26
J. S. Smith, O. Isayev, A. E. Roitberg, ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
27
M. Ragoza, J. Hochuli, E. Idrobo, J. Sunseri, D. R. Koes, Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 57, 942–957 (2017).
28
K. Xu, Z. Wang, J. Shi, H. Li, Q. C. Zhang, “A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI Press, 2019), pp. 1230–1237.
29
F. Noé, S. Olsson, J. Köhler, H. Wu, Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science 365, eaaw1147 (2019).
30
M. AlQuraishi, End-to-End Differentiable Learning of Protein Structure. Cell Syst. 8, 292–301.e3 (2019).
31
K. M. Kutchko, A. Laederach, Transcending the prediction paradigm: Novel applications of SHAPE to RNA function and evolution. WIREs RNA 8, e1374 (2017).
32
R. J. L. Townshend, S. Eismann, A. M. Watkins, R. Rangan, M. Karelina, R. Das, R. O. Dror, Training code for ARES neural network, Version 1.0, Zenodo (2021); https://doi.org/10.5281/zenodo.5088971.
33
R. J. L. Townshend, S. Eismann, A. M. Watkins, R. Rangan, M. Karelina, R. Das, R. O. Dror, ARES-specific adaptation of E3NN, Version 1.0, Zenodo (2021); https://doi.org/10.5281/zenodo.5090151.
34
R. J. L. Townshend, S. Eismann, A. M. Watkins, R. Rangan, M. Karelina, R. Das, R. O. Dror, Auxiliary code related to the publication “Geometric Deep Learning of RNA Structure,” Version 1.0, Zenodo (2021); https://doi.org/10.5281/zenodo.5090157.
35
R. J. L. Townshend, A. M. Watkins, S. Eismann, R. Rangan, M. Karalina, R. Das, R. O. Dror, Structural data used to train, test, and characterize a new geometric deep learning RNA scoring function, Stanford Digital Repository (2020); https://doi.org/10.25740/bn398fc4306.
36
A. M. Watkins, R. Rangan, R. J. L. Townshend, S. Eismann, M. Karelina, R. O. Dror, R. Das, Structural data used to test a new geometric deep learning RNA scoring function emulating fully de novo modeling conditions, Stanford Digital Repository (2021); https://doi.org/10.25740/sq987cc0358.
37
T. S. Cohen, M. Welling, “Group equivariant convolutional networks” in Proceedings of the 33rd International Conference on Machine Learning, M. F. Balcan, K. Q. Weinberger, Eds. (PMLR, 2016), pp. 2990–2999.
38
R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, H. S. Seung, Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947–951 (2000).
39
M. Reisert, H. Burkhardt, “Spherical tensor calculus for local adaptive filtering” in Tensors in Image Processing and Computer Vision, S. Aja-Fernández, R. de Luis García, D. Tao, X. Li, Eds. (Springer, 2009), pp. 153–178.
40
K. T. Schütt, P.-J. Kindermans, H. E. Sauceda, S. Chmiela, A. Tkatchenko, K.-R. Müller, “SchNet: A continuous-filter convolutional neural network for modeling quantum interactions” in NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, U. von Luxburg, I. Guyon, S. Bengio, H. Wallach, R. Fergus, Eds. (Curran Associates, 2017), pp. 992–1002.
41
D. A. Clevert, T. Unterthiner, S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs)” in Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), Y. Bengio, Y. LeCun, Eds. (2016).
42
X. Glorot, Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Y. W. Teh, M. Titterington, Eds. (PMLR, 2010), pp. 249–256.
43
D. P. Kingma, J. L. Ba, “Adam: A method for stochastic optimization” in Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), Y. Bengio, Y. LeCun, Eds. (2015).
44
P. J. Huber, Robust Estimation of a Location Parameter. Ann. Math. Stat. 35, 73–101 (1964).
45
Z. Miao, E. Westhof, RNA Structure: Advances and Assessment of 3D Structure Prediction. Annu. Rev. Biophys. 46, 483–503 (2017).
46
A. Sergeev, M. Del Balso, Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799 [cs.LG] (2018).
47
I. Kalvari, E. P. Nawrocki, N. Ontiveros-Palacios, J. Argasinska, K. Lamkiewicz, M. Marz, S. Griffiths-Jones, C. Toffano-Nioche, D. Gautheret, Z. Weinberg, E. Rivas, S. R. Eddy, R. D. Finn, A. Bateman, A. I. Petrov, Rfam 14: Expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
48
C. B. Do, D. A. Woods, S. Batzoglou, CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22, e90–e98 (2006).
49
M. E. Sherlock, N. Sudarsan, S. Stav, R. R. Breaker, Tandem riboswitches form a natural Boolean logic gate to control purine metabolism in bacteria. eLife 7, e33908 (2018).
50
E. B. Porter, J. G. Marcano-Velázquez, R. T. Batey, The purine riboswitch as a model system for exploring RNA biology and chemistry. Biochim. Biophys. Acta 1839, 919–930 (2014).
51
M. J. Boniecki, G. Lach, W. K. Dawson, K. Tomala, P. Lukasz, T. Soltysinski, K. M. Rother, J. M. Bujnicki, SimRNA: A coarse-grained method for RNA folding simulations and 3D structure prediction. Nucleic Acids Res. 44, e63 (2016).
52
R. Das, J. Karanicolas, D. Baker, Atomic accuracy in predicting and designing noncanonical RNA structure. Nat. Methods 7, 291–294 (2010).
53
S. Chaudhury, S. Lyskov, J. J. Gray, PyRosetta: A script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26, 689–691 (2010).
54
N. B. Leontis, C. L. Zirbel, “Nonredundant 3D structure datasets for RNA knowledge extraction and benchmarking” in RNA 3D Structure Analysis and Prediction, N. Leontis, E. Westhof, Eds. (Springer, 2012), pp. 281–298.
55
X. J. Lu, W. K. Olson, 3DNA: A software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 31, 5108–5121 (2003).
56
K. Launer-Felty, C. J. Wong, J. L. Cole, Structural analysis of adenovirus VAI RNA defines the mechanism of inhibition of PKR. Biophys. J. 108, 748–757 (2015).
57
J. Zhang, A. R. Ferré-D’Amaré, Direct evaluation of tRNA aminoacylation status by the T-box riboswitch using tRNA-mRNA stacking and steric readout. Mol. Cell 55, 148–155 (2014).
58
A. V. Sherwood, F. J. Grundy, T. M. Henkin, T box riboswitches in Actinobacteria: Translational regulation via novel tRNA interactions. Proc. Natl. Acad. Sci. U.S.A. 112, 1113–1118 (2015).
59
I. V. Hood, J. M. Gordon, C. Bou-Nader, F. E. Henderson, S. Bahmanjah, J. Zhang, Crystal structure of an adenovirus virus-associated RNA. Nat. Commun. 10, 2871 (2019).
60
S. Li, Z. Su, J. Lehmann, V. Stamatopoulou, N. Giarimoglou, F. E. Henderson, L. Fan, G. D. Pintilie, K. Zhang, M. Chen, S. J. Ludtke, Y.-X. Wang, C. Stathopoulos, W. Chiu, J. Zhang, Structural basis of amino acid surveillance by higher-order tRNA-mRNA interactions. Nat. Struct. Mol. Biol. 26, 1094–1105 (2019).
61
K. C. Suddala, J. Zhang, High-affinity recognition of specific tRNAs by an mRNA anticodon-binding groove. Nat. Struct. Mol. Biol. 26, 1114–1122 (2019).
62
K. Kappel, K. Zhang, Z. Su, A. M. Watkins, W. Kladwang, S. Li, G. Pintilie, V. V. Topkar, R. Rangan, I. N. Zheludev, J. D. Yesselman, W. Chiu, R. Das, Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods 17, 699–707 (2020).
63
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, “PyTorch: An imperative style, high-performance deep learning library” in Advances in Neural Information Processing Systems 32 (NeurIPS 2019), H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett, Eds. (Curran Associates, 2019), pp. 8024–8035.
64
M. Geiger, T. Smidt, A. K. Miller, W. Boomsma, B. Dice, K. Lapchevskyi, M. Weiler, M. Tyszkiewicz, S. Batzner, J. Frellsen, N. Jung, S. Sanborn, J. Rackers, M. Bailey, E3NN, GitHub (2021); https://github.com/e3nn/e3nn.
65
M. Parisien, J. A. Cruz, E. Westhof, F. Major, New metrics for comparing and assessing discrepancies between RNA 3D structures and models. RNA 15, 1875–1885 (2009).

Information & Authors

Information

Published In

Science
Volume 373 | Issue 6558
27 August 2021

Submission history

Received: 31 August 2020
Accepted: 14 July 2021
Published in print: 27 August 2021

Permissions

Request permissions for this article.

Acknowledgments

The authors thank R. Altman, R. Betz, T. Dao, H. Hanley, S. Hollingsworth, M. Jagota, B. Jing, Y. Laloudakis, N. Latorraca, J. Paggi, A. Powers, P. Suriana, and N. Thomas for discussions and advice. Funding: Funding was provided by National Science Foundation Graduate Research Fellowships (R.J.L.T. and R.R.); the U.S. Department of Energy, Office of Science Graduate Student Research program (R.J.L.T.); a Stanford Bio-X Bowes Fellowship (S.E.); the Army Research Office Multidisciplinary University Research Initiative program (R.D.); the U.S. Department of Energy, Office of Science, Scientific Discovery through Advanced Computing (SciDAC) program (R.O.D.); Intel (R.O.D.); a Stanford Bio-X seed grant (R.D. and R.O.D.); and National Institutes of Health grants R21CA219847 (R.D.) and R35GM122579 (R.D.). Author contributions: R.J.L.T., A.M.W., R.D., and R.O.D. designed the research. S.E. formulated the idea of predicting RMSD from atomic coordinates and built the initial neural network. R.J.L.T., A.M.W., S.E., and M.K. performed and analyzed ARES experiments. A.M.W. generated candidate structural models, with assistance from R.R. R.J.L.T., A.M.W., R.D., and R.O.D. interpreted results. R.J.L.T., A.M.W., and R.O.D. wrote the paper, with input from all authors. Competing interests: Stanford University has filed a provisional patent application related to this work. R.J.L.T. is the founder of Atomic AI, an artificial intelligence–driven rational design company. R.D. has received honoraria for seminars at Ribometrix and Pfizer. Data and materials availability: Code for the ARES network and data analysis is available on Zenodo (3234). All generated models, submitted blind predictions, and other ARES predictions that support the findings of this study are also available at the Stanford Digital Repository (35, 36). The trained ARES model is available at http://drorlab.stanford.edu/ares.html as a web server.

Authors

Affiliations

Department of Computer Science, Stanford University, Stanford, CA, USA.
Present address: Atomic AI, Menlo Park, CA, USA.
Stephan Eismann
Department of Computer Science, Stanford University, Stanford, CA, USA.
Department of Applied Physics, Stanford University, Stanford, CA, USA.
Department of Biochemistry, Stanford University, Stanford, CA, USA.
Department of Biochemistry, Stanford University, Stanford, CA, USA.
Biophysics Program, Stanford University, Stanford, CA, USA.
Department of Computer Science, Stanford University, Stanford, CA, USA.
Biophysics Program, Stanford University, Stanford, CA, USA.
Department of Biochemistry, Stanford University, Stanford, CA, USA.
Department of Physics, Stanford University, Stanford, CA, USA.
Department of Computer Science, Stanford University, Stanford, CA, USA.
Department of Structural Biology, Stanford University, Stanford, CA, USA.
Department of Molecular and Cellular Physiology, Stanford University, Stanford, CA, USA.
Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA.

Funding Information

Notes

These authors contributed equally to this work.
*Corresponding author. Email: [email protected] (R.O.D.); [email protected] (R.D.)

Metrics & Citations

Metrics

Article Usage
Altmetrics

Citations

Export citation

Select the format you want to export the citation of this publication.

Cited by
  1. Piercing the fog of the RNA structure-ome, Science, 373, 6558, (964-965), (2021)./doi/10.1126/science.abk1971
    Abstract
Loading...

View Options

Get Access

Log in to view the full text

AAAS Log in

AAAS login provides access to Science for AAAS members, and access to other journals in the Science family to users who have purchased individual subscriptions.

Log in via OpenAthens.
Log in via Shibboleth.
More options

Purchase digital access to this article

Download and print this article for your personal scholarly, research, and educational use.

Purchase this issue in print

Buy a single issue of Science for just $15 USD.

View options

PDF format

Download this article as a PDF file

Download PDF

Media

Figures

Multimedia

Tables

Share

Share

Share article link

Share on social media

(0)eLetters

eLetters is an online forum for ongoing peer review. Submission of eLetters are open to all. eLetters are not edited, proofread, or indexed. Please read our Terms of Service before submitting your own eLetter.

Log In to Submit a Response

No eLetters have been published for this article yet.