Facial expression conveys nonverbal communication information to help humans better perceive physical or psychophysical situations. Accurate 3D imaging provides stable topographic changes for reading facial expression. In particular, light‐field cameras (LFCs) have high potential for constructing depth maps, thanks to a simple configuration of microlens arrays and an objective lens. Herein, machine‐learned NIR‐based LFCs (NIR‐LFCs) for facial expression reading by extracting Euclidean distances of 3D facial landmarks in pairwise fashion are reported. The NIR‐LFC contains microlens arrays with asymmetric Fabry−Perot filter and NIR bandpass filter on CMOS image sensor, fully packaged with two vertical‐cavity surface‐emitting lasers. The NIR‐LFC not only increases the image contrast by 2.1 times compared with conventional LFCs, but also reduces the reconstruction errors by up to 54%, regardless of ambient illumination conditions. A multilayer perceptron (MLP) classifies input vectors, consisting of 78 pairwise distances on the facial depth map of happiness, anger, sadness, and disgust, and also exhibits exceptional average accuracy of 0.85 (p<0.05). This LFC provides a new platform for quantitatively labeling facial expression and emotion in point‐of‐care biomedical, social perception, or human−machine interaction applications.