With the ever-improving Earth observation capabilities, variables such as tree health status, biomass storage, or stand structure are increasingly estimated through remote sensing. While many protocols of data acquisition and satellite data processing are in place, the still novel unmanned aerial vehicles (UAVs) face some challenges during data acquisition and processing. While tree height extraction seems to be a common practice, identifying individual trees and measuring their crowns is still quite tricky. We performed several flights with three different UAVs and four different popular sensors over two sites with coniferous forests of various ages at flight levels of 100–200 m above ground level (AGL) using custom settings preset by UAV solution suppliers. Considering the success rate of the individual tree identification, casual RGB cameras provided more consistent results at all flight levels (84 − 77% for Phantom 4), while the success of tree identification decreases with higher flight levels and smaller crowns in the case of multispectral cameras (77 − 54% for RedEdge-M). In general, RGB cameras yielded the best results at 150 m AGL while multispectral cameras at 100 m AGL. Regarding the accuracy of the measured crown diameters, most datasets tended to overestimate when using automatic crown delineation within the lidR package. Only RGB cameras yielded satisfactory results (Mean Absolute Error – MAE of 0.79–0.99 m and 0.88–1.16 m for Phantom 4 and Zenmuse X5S, respectively). Multispectral cameras overestimated more than RGB cameras, especially in the full-grown forest (MAE = 1.26–1.77 m); on the other hand, they offered, in addition to the structural, also spectral information. We conclude that widespread ready-made solutions mounted with low-cost RGB cameras yield very satisfactory results for describing the structural forest information at 150 m AGL. When (multi)spectral information is needed, we recommend reducing the flight level to 100 m AGL to acquire sufficient structural forest information.