Traditional machine-learning methods are inefficient in capturing chaos in nonlinear dynamical systems, especially when the time difference Δt between consecutive steps is so large that the extracted time series looks apparently random. Here, we introduce a new long-short-term-memory (LSTM)-based recurrent architecture by tensorizing the cell-state-to-state propagation therein, maintaining the long-term memory feature of LSTM, while simultaneously enhancing the learning of short-term nonlinear complexity. We stress that the global minima of training can be most efficiently reached by our tensor structure where all nonlinear terms, up to some polynomial order, are treated explicitly and weighted equally. The efficiency and generality of our architecture are systematically investigated and tested through theoretical analysis and experimental examinations. In our design, we have explicitly used two different many-body entanglement structures—matrix product states (MPS) and the multiscale entanglement renormalization ansatz (MERA)—as physics-inspired tensor decomposition techniques, from which we find that MERA generally performs better than MPS, hence conjecturing that the learnability of chaos is determined not only by the number of free parameters but also the tensor complexity—recognized as how entanglement entropy scales with varying matricization of the tensor.