This work aims to reproduce the time before or after a merger event of merging galaxies from the IllustrisTNG cosmological simulation using machine learning. Images of merging galaxies were created in the $u$, $g$, $r$, and $i$ bands from IllustrisTNG. The merger times were determined using the time difference between the last simulation snapshot where the merging galaxies were tracked as two galaxies and the first snapshot where the merging galaxies were tracked as a single galaxy. This time was then further refined using simple gravity simulations. These data were then used to train a residual network (ResNet50), a Swin Transformer (Swin), a convolutional neural network (CNN), and an autoencoder (using a single latent neuron) to reproduce the merger time. The full latent space of the autoencoder was also studied to see if it reproduces the merger time better than the other methods. This was done by reducing the latent space dimensions using Isomap, linear discriminant analysis (LDA), neighbourhood components analysis, sparse random projection, truncated singular value decomposition, and uniform manifold approximation and projection. The CNN is the best of all the neural networks. The performance of the autoencoder was close to the CNN, with Swin close behind the autoencoder. ResNet50 performed the worst. The LDA dimensionality reduction performed the best of the six methods used. The exploration of the full latent space produced worse results than the single latent neuron of the autoencoder. For the test data set, we found a median error of 190 Myr, comparable to the time separation between snapshots in IllustrisTNG. Galaxies more than $ 625$ Myr before a merger have poorly recovered merger times, as well as galaxies more than $ 125$ Myr after a merger event.