Takagi-Sugeno-Kang (TSK) fuzzy systems are very useful machine learning models for regression problems. However, to our knowledge, there has not existed an efficient and effective training algorithm that ensures their generalization performance, and also enables them to deal with big data. Inspired by the connections between TSK fuzzy systems and neural networks, we extend three powerful neural network optimization techniques, i.e., mini-batch gradient descent (MBGD), regularization, and AdaBound, to TSK fuzzy systems, and also propose three novel techniques (DropRule, DropMF, and Drop-Membership) specifically for training TSK fuzzy systems. Our final algorithm, MBGD with regularization, DropRule and Ad-aBound (MBGD-RDA), can achieve fast convergence in training TSK fuzzy systems, and also superior generalization performance in testing. It can be used for training TSK fuzzy systems on datasets of any size; however, it is particularly useful for big datasets, on which currently no other efficient training algorithms exist.