Soft sensor, as an important paradigm for industrial intelligence, is widely used in industrial production to achieve efficient monitoring and prediction of production status including product quality. Data-driven soft sensor methods have attracted attention, which still have challenges because of complex industrial data with diverse characteristics, nonlinear relationships and massive unlabeled samples. In this paper, a data-driven self-supervised long short term memory-deep factorization machine (LSTM-DeepFM) model is proposed for industrial soft sensor, in which a framework mainly including pretraining and finetuning stages is proposed to explore diverse industrial data characteristics. In the pretraining stage, LSTM-Autoencoder is first unsupervised pretrained. Then, based on two self-supervised mask strategies, LSTM-Deep can explore the interdependencies between features as well as the dynamic fluctuation in time series. In the finetuning stage, relying on pretrained representation, the temporal, high-dimensional and low-dimensional features can be extracted from the LSTM, Deep and FM components respectively. Finally, experiments on the real-world mining dataset demonstrate that the proposed method achieves state-of-the-art comparing with stacked autoencoder (SAE) based models, variational autoencoder (VAE) based models, and semisupervised parallel DeepFM (SS-PdeepFM), etc.