Many machine/deep learning artificial neural networks are trained to simply be interpolation functions that map input variables to output values interpolated from the training data in a linear/nonlinear fashion. Even when the input/output pairs of the training data are physically accurate (e.g. the results of an experiment or numerical simulation), interpolated quantities can deviate quite far from being physically accurate. Although one could project the output of a network into a physically feasible region, such a postprocess is not captured by the energy function minimized when training the network; thus, the final projected result could incorrectly deviate quite far from the training data. We propose folding any such projection or postprocess directly into the network so that the final result is correctly compared to the training data by the energy function. Although we propose a general approach, we illustrate its efficacy on a specific convolutional neural network that takes in human pose parameters (joint rotations) and outputs a prediction of vertex positions representing a triangulated cloth mesh. While the original network outputs vertex positions with erroneously high stretching and compression energies, the new network trained with our physics prior remedies these issues producing highly improved results. there will be large errors inf (w, x T ) when compared to y T . On the other hand, although one could create a network with many degrees of freedom in order to capture y T =f (w, x T ) as accurately as desired, even exactly,f (w, x) could oscillate wildly and inaccurately when x is not equal to x T , i.e. overfitting. See, e.g. [14,15,16,17]. One needs to take great care when designing the network architecture in order to avoid underfitting while still allowing for enough regularization to also avoid overfitting. Likewise, the form of the energy function and nature of the numerical optimization techniques also need careful consideration. Some of the most popular methods include variants of BFGS [18,19,20] and a number of methods based on gradient descent [21,22] (see also [23] and the references therein) or interpreting gradient descent as a numerical approximation to an ordinary differential equation to be solved via various approaches motivated by order of accuracy [24,25] and adaptive time-stepping [26,27,28,29,30].Devising a network architecture with enough representative capability to alleviate underfitting while still being amenable to the regularization required to avoid overfitting, and subsequently applying numerical optimization techniques to an adequately designed energy in order to find reasonable parameters w is a quite difficult and mostly experimental endeavour. Thus, much of the progress made by the community emanates from the laborious creation of data sets that many researchers can consider in order to design network architectures and find suitable parameters w, see e.g. [31]. This is typically driven by a community (rather than an individual or group) effort, and state-of-the-art r...