Eddy current testing (ECT) is an effective technique for evaluating depth of metal surface defects. However, in practice, evaluation primarily relies on the experience of an operator and is often carried out by manual inspection. In this paper, we address the challenges of automatic depth evaluation of metal surface defects by virtual of state-of-the-art deep learning (DL) techniques. The main contributions are three-fold. Firstly, a highly-integrated portable ECT device is developed, taking the advantage of an advanced field programmable gate array (Zynq-7020 system on chip) and provides fast data acquisition and in-phase/quadrature demodulation. Secondly, a dataset, termed as MDDECT, is constructed using the ECT device by human operators and made openly available. It contains 48,000 scans from 18 defects of different depths and lift-offs. Thirdly, the depth evaluation problem is formulated as a time series classification problem, and various state-of-the-art 1-d residual convolutional neural networks are trained and evaluated on the MDDECT dataset. A 38-layer 1-d ResNeXt achieves an accuracy of 93.58% in discriminating the surface defects in a stainless steel sheet with depths from 0.3 mm to 2.0 mm in the resolution of 0.1 mm. In addition, results show that the trained ResNeXt1D-38 model is immune to lift-off signals.