Coarse-grained descriptions of collective motion of flocking systems are often derived for the macroscopic or the thermodynamic limit. However, many real flocks are small sized (10 to 100 individuals), called the mesoscopic scales, where stochasticity arising from the finite flock sizes is important. Developing mesoscopic scale equations, typically in the form of stochastic differential equations, can be challenging even for the simplest of the collective motion models. Here, we take a novel data-driven equation learning approach to construct the stochastic mesoscopic descriptions of a simple self-propelled particle (SPP) model of collective motion. In our SPP model, a focal individual can interact with k randomly chosen neighbours within an interaction radius. We consider k = 1 (called stochastic pairwise interactions), k = 2 (stochastic ternary interactions), and k equalling all available neighbours within the interaction radius (equivalent to Vicsek-like local averaging). The data-driven mesoscopic equations reveal that the stochastic pairwise interaction model produces a novel form of collective motion driven by a multiplicative noise term (hence termed, noise-induced flocking). In contrast, for higher order interactions (k > 1), including Vicsek-like averaging interactions, yield collective motion driven primarily by the deterministic forces. We find that the relation between the parameters of the mesoscopic equations describing the dynamics and the population size are sensitive to the density and to the interaction radius, exhibiting deviations from mean-field theoretical expectations. We provide semi-analytic arguments potentially explaining these observed deviations. In summary, our study emphasizes the importance of mesoscopic descriptions of flocking systems and demonstrates the potential of the data-driven equation discovery methods for complex systems studies.