“…However, the penalty can, if necessary, be made arbitrarily small by increasing the number of matrices in the set S. The increase of the computational complexity is marginal as a larger number of M matrices is included, which is seen by writing (31) in terms of´(Y a (f )) * Y b (f + 1/T )e −j2πηT f df for a, b ∈ {x, y} and the matrix elements of M. Then, for a given value of η, the four integrals can be calculated first, and the results are used to form a linear combination with the elements of the matrix M. We notice that the final result (31) bears resemblance to the estimator suggested in [6], which was not derived directly from the ML criterion. However, some differences exist, e.g., (i) [6,Eq. (1)] is for a scalar field and no explicit algorithm is given for a polarization-multiplexed signal; (ii) while DGD and PMD are mentioned in [6], no detailed method for handling DGD is given; (iii) The spectral shift is ±1/T in [6], while it is 1/T in (31); (iv) two different cost functions are suggested in [6], while the ML approach results in one single algorithm.…”