A coherent optical (CO) orthogonal frequency division multiplexing (OFDM) scheme gives a scalable and flexible solution for increasing the transmission rate, being extremely robust to chromatic dispersion as well as polarization mode dispersion. Nevertheless, as any coherent-detection OFDM system, the overall system performance is limited by laser phase noises. On the other hand, extreme learning machines (ELMs) have gained a lot of attention from the machine learning community owing to good generalization performance, negligible learning speed, and minimum human intervention. In this manuscript, a phase-error mitigation method based on the single-hidden layer feedforward network prone to the improved ELM algorithm for CO-OFDM systems is introduced for the first time. In the training step, two steps are distinguished. Firstly, pilots are used, which is very common in OFDM-based systems, to diminish laser phase noises as well as to correct frequency-selective impairments and, therefore, the bandwidth efficiency can be maximized. Secondly, the regularization parameter is included in the ELM to balance the empirical and structural risks, namely to minimize the root mean square error in the test stage and, consequently, the bit error rate (BER) metric. The operational principle of the real-complex (RC) ELM is analytically explained, and then, its sub-parameters (number of hidden neurons, regularization parameter, and activation function) are numerically found in order to enhance the system performance. For binary and quadrature phase-shift keying modulations, the RC-ELM outperforms the benchmark pilot-assisted equalizer as well as the fully-real ELM, and almost matches the common phase error (CPE) compensation and the ELM defined in the complex domain (C-ELM) in terms of the BER over an additive white Gaussian noise channel and different laser oscillators. However, both techniques are characterized by the following disadvantages: the CPE compensator reduces the transmission rate since an additional preamble is mandatory for channel estimation purposes, while the C-ELM requires a bounded and differentiable activation function in the complex domain and can not follow semi-supervised training. In the same context, the novel ELM algorithm can not compete with the CPE compensator and C-ELM for the 16-ary quadrature amplitude modulation. On the other hand, the novel ELM exposes a negligible computational cost with respect to the C-ELM and PAE methods.