Essential to many tasks in relation to multimedia research and development is the availability of a sufficiently large data set and its corresponding ground truth. However, most available data for multimedia research are either too specific, e.g., data for text retrieval; too small, e.g., face figures; nor without ground truth, such as gathering millions of un-preprocessing images from the Web for testing. While it is relatively easy to crawl and store a huge amount of data, the creation of ground truth necessary to systematically train, test, evaluate, and compare the performance of various algorithms and systems remains a challenging issue. For this reason, researchers tend to put (or redirect) efforts into the creation of such corpus individually to carry out research on large-scale data sets. Thus, a promising trend of a united web-scale and distributed multimedia data management is urgently needed, which would benefit the entire multimedia research community. This special issue presents and reports on the construction and analysis of large-scale multimedia data sets and resources, and provides a strong reference for multimedia researchers interested in large-scale multimedia data sets. In particular, the special issue demonstrates the emerging techniques and applications for large-scale multimedia data management.