Measurement noise is one of the main sources of errors in currently available quantum devices based on superconducting qubits. At the same time, the complexity of its characterization and mitigation often exhibits exponential scaling with the system size. In this work, we introduce a correlated measurement noise model that can be efficiently described and characterized, and which admits effective noise-mitigation on the level of marginal probability distributions. Noise mitigation can be performed up to some error for which we derive upper bounds. Characterization of the model is done efficiently using Diagonal Detector Overlapping Tomography – a generalization of the recently introduced Quantum Overlapping Tomography to the problem of reconstruction of readout noise with restricted locality. The procedure allows to characterize k-local measurement cross-talk on N-qubit device using O(k2klog(N)) circuits containing random combinations of X and identity gates. We perform experiments on 15 (23) qubits using IBM's (Rigetti's) devices to test both the noise model and the error-mitigation scheme, and obtain an average reduction of errors by a factor >22 (>5.5) compared to no mitigation. Interestingly, we find that correlations in the measurement noise do not correspond to the physical layout of the device. Furthermore, we study numerically the effects of readout noise on the performance of the Quantum Approximate Optimization Algorithm (QAOA). We observe in simulations that for numerous objective Hamiltonians, including random MAX-2-SAT instances and the Sherrington-Kirkpatrick model, the noise-mitigation improves the quality of the optimization. Finally, we provide arguments why in the course of QAOA optimization the estimates of the local energy (or cost) terms often behave like uncorrelated variables, which greatly reduces sampling complexity of the energy estimation compared to the pessimistic error analysis. We also show that similar effects are expected for Haar-random quantum states and states generated by shallow-depth random circuits.