Deep neural networks (DNNs) are effective in solving many real-world problems. Larger DNN models usually exhibit better quality (e.g., accuracy) but their excessive computation results in long training and inference time. Model sparsification can reduce the computation and memory cost while maintaining model quality. Most existing sparsification algorithms unidirectionally remove weights, while others randomly or greedily explore a small subset of weights in each layer for pruning. The limitations of these algorithms reduce the level of sparsity. In addition, many algorithms still require pre-trained dense models and thus suffer from large memory footprint. In this paper, we propose a novel scheduled grow-and-prune (GaP) methodology without having to pre-train a dense model. It addresses the shortcomings of the previous work by repeatedly growing a subset of layers to dense and then pruning them back to sparse after some training. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks, such as image classification, objective detection, 3D object part segmentation, and translation. They also outperform other state-of-the-art (SOTA) methods for model sparsification. As an example, a 90% sparse ResNet-50 obtained via GaP achieves 77.9% top-1 accuracy on ImageNet, improving the SOTA results of sparsification algorithms by 1.5%.Early works on weight pruning generally follow a prune-from-dense methodology [2,3,4,5,6,7,8,9], which usually requires 3 phases of training: pre-train a dense model, prune it to sparse, and fine-tune it. In such methodologies, however, the pre-trained dense models consume large memory space and may lead to long training time. In addition, one-shot or iteratively pruning from a well-trained DNN can only remove weights, which lacks the flexibility of growing back weights that are considered unimportant early in the training process but showed to be significant later in training. † Equal contribution.