The nonlinear Fourier transform (NFT) based signal processing has attracted considerable attention as a promising tool for fibre nonlinearity mitigation in optical transmission. However, the mathematical complexity of NFT algorithms and the noticeable distinction of the latter from the "conventional" (Fourier-based) methods make it difficult to adapt this approach for practical applications. In our work, we demonstrate a hardware implementation of the fast direct NFT operation: it is used to map the optical signal onto its nonlinear Fourier spectrum, i.e. to demodulate the data. The main component of the algorithm is the matrix-multiplier unit, implemented on field-programmable gate arrays (FPGA) and used in our study for the estimation of required hardware resources. To design the best performing implementation in limited resources, we carry out the processing accuracy analysis to estimate the optimal bit width. The fast NFT algorithm that we analyse, is based on the FFT, which leads to the O(N log 2 2 N ) method's complexity for the signal consisting of N samples. Our analysis revealed the significant demand in DSP blocks on the used board, which is caused by the complex-valued matrix operations and FFTs. Nevertheless, it seems to be possible to utilise further the parallelisation of our NFT-processing implementation for the more efficient NFT hardware realisation.