Laser scanning microscopy has inherent tradeoffs between imaging speed, field of view (FOV), and spatial resolution due to the limitations of sophisticated mechanical and optical setups, and deep learning networks have emerged to overcome these limitations without changing the system. Here, we demonstrate deep learning autofluorescence-harmonic microscopy (DLAM) based on self-alignment attention-guided residual-in-residual dense generative adversarial networks to close the gap between speed, FOV, and quality. Using the framework, we demonstrate label-free large-field multimodal imaging of clinicopathological tissues with enhanced spatial resolution and running time advantages. Statistical quality assessments show that the attention-guided residual dense connections minimize the persistent noise, distortions, and scanning fringes that degrade the autofluorescence-harmonic images and avoid reconstruction artifacts in the output images. With the advantages of high contrast, high fidelity, and high speed in image reconstruction, DLAM can act as a powerful tool for the noninvasive evaluation of diseases, neural activity, and embryogenesis.