With advancements in sensor technology, heterogeneous sets of data such as those containing scalars, waveform signals, images, and even structured point clouds, are becoming increasingly popular. Developing statistical models based on such heterogeneous sets of data to represent the behavior of an underlying system can be used in the monitoring, control, and optimization of the system. Unfortunately, available methods only focus on the scalar and curve data and do not provide a general framework for integrating different sources of data to construct a model. This paper addresses the problem of estimating a process output, measured by a scalar, curve, image, or structured point cloud by a set of heterogeneous process variables such as scalar process setting, sensor readings, and images. We introduce a general multiple tensoron-tensor regression (MTOT) approach in which each set of input data (predictor) and output measurements are represented by tensors. We formulate a linear regression model between the input and output tensors and estimate the parameters by minimizing a least square loss function. In order to avoid overfitting and reduce the number of parameters to be estimated, we decompose the model parameters using several bases that span the input and output spaces.Next, we learn the bases and their spanning coefficients when minimizing the loss function using a block coordinate descent algorithm combined with an alternating least square (ALS) algorithm. We show that such a minimization has a closed-form solution in each iteration and can be computed very efficiently. Through several simulation and case studies, we evaluate the performance of the proposed method. The results reveal the advantage of the proposed method over some benchmarks in the literature in terms of the mean square prediction error.