The capacity limits of fiber-optic communication systems in the nonlinear regime are not yet well understood. In this paper, we study the capacity of amplitude modulated first-order soliton transmission, defined as the maximum of the so-called time-scaled mutual information. Such definition allows us to directly incorporate the dependence of soliton pulse width to its amplitude into capacity formulation. The commonly used memoryless channel model based on noncentral chi-squared distribution is initially considered. Applying a variance normalizing transform, this channel is approximated by a unit-variance additive white Gaussian noise (AWGN) model. Based on a numerical capacity analysis of the approximated AWGN channel, a general form of capacity-approaching input distributions is determined. These optimal distributions are discrete comprising a mass point at zero (off symbol) and a finite number of mass points almost uniformly distributed away from zero. Using this general form of input distributions, a novel closed-form approximation of the capacity is determined showing a good match to numerical results. Finally, mismatch capacity bounds are developed based on split-step simulations of the nonlinear Schro¨dinger equation considering both single soliton and soliton sequence transmissions. This relaxes the initial assumption of memoryless channel to show the impact of both inter-soliton interaction and Gordon–Haus effects. Our results show that the inter-soliton interaction effect becomes increasingly significant at higher soliton amplitudes and would be the dominant impairment compared to the timing jitter induced by the Gordon–Haus effect.