Context. The intermediate-mass pre-main sequence Herbig Ae/Be stars are key to understanding the differences in formation mechanisms between low- and high-mass stars. The study of the general properties of these objects is hampered by the lack of a well-defined, homogeneous sample, and because few and mostly serendipitously discovered sources are known.
Aims. Our goal is to identify new Herbig Ae/Be candidates to create a homogeneous and well defined catalogue of these objects.
Methods. We have applied machine learning techniques to 4 150 983 sources with data from Gaia DR2, 2MASS, WISE, and IPHAS or VPHAS+. Several observables were chosen to identify new Herbig Ae/Be candidates based on our current knowledge of this class, which is characterised by infrared excesses, photometric variabilities, and Hα emission lines. Classical techniques are not efficient for identifying new Herbig Ae/Be stars mainly because of their similarity with classical Be stars, with which they share many characteristics. By focusing on disentangling these two types of objects, our algorithm has also identified new classical Be stars.
Results. We have obtained a large catalogue of 8470 new pre-main sequence candidates and another catalogue of 693 new classical Be candidates with a completeness of 78.8 ± 1.4% and 85.5 ± 1.2%, respectively. Of the catalogue of pre-main sequence candidates, at least 1361 sources are potentially new Herbig Ae/Be candidates according to their position in the Hertzsprung-Russell diagram. In this study we present the methodology used, evaluate the quality of the catalogues, and perform an analysis of their flaws and biases. For this assessment, we make use of observables that have not been accounted for by the algorithm and hence are selection-independent, such as coordinates and parallax based distances. The catalogue of new Herbig Ae/Be stars that we present here increases the number of known objects of the class by an order of magnitude.