Waveform inversion is theoretically a powerful tool to reconstruct subsurface structures, but a usually encountered problem is that accurate sources are very rare, causing the computation to be unstable or divergent. This challenging practical problem, although sometimes ignored and even imperceptible, can easily create discrepancies in calculated shot gathers, which will then lead to wrong residuals that will be smeared back to the gradients, hence jeopardizing the inverted tomograms. For any real dataset, every shot gather corresponds to its unique source even if some gathers can be transformed alike after data processing. To resolve this problem, we propose a collocated inversion of sources and early arrival waveforms with the two submodules executing successively. Not only can this method reconstruct a decent source wavelet that approaches the ground truth, but also it can produce credible background tomograms with optimized sources. Part of the cycle skipping problems can also be mitigated because it avoids the trial and error experiments on various sources. Numerical tests on a synthetic and a land dataset validate the effectiveness of this method. Restrictions on initial sources or starting velocity models will be relaxed, and this method can be extended to any other applications for engineering or exploration purposes.