In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time.Formally, in this problem, one is given matrix A ∈ R n×d , b ∈ R n , w ∈ R n and any of functions exp, cosh and sinh denoted as f . The goal is to find the optimal x that minimize 0.5 f (Ax) − b 2 2 + 0.5 diag(w)Ax 2 2 . The straightforward method is to use the naive Newton's method. Let nnz(A) denote the number of non-zeros entries in matrix A. Let ω denote the exponent of matrix multiplication. Currently, ω ≈ 2.373. Let ǫ denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use log( x 0 −x * 2 /ǫ) iterations and O(nnz(A) + d ω ) per iteration time to solve the problem.