With the explosive increase of processed data, data transmission through the bus between CPU and the main memory has become a bottleneck in the traditional von Neumann architecture. On top of this, popular data-intensive workloads, such as neural networks and graph computing applications, have poor data locality, which results in a substantial increase of the cache miss rate. Processing such popular data-intensive workloads hinders the entire system since the data transmission causes long latency and high energy consumption. Processing-in-memory greatly reduces this data transmission by equipping the main memory with computation ability, alleviating the problems of poor performance and high energy consumption caused by a large amount of data and a poor data locality. Processing-in-memory consists of two different approaches. One method involves integrating computation resources into the main memory with high-bandwidth interconnects (i.e., near data computing). The other method consists of employing memory arrays to compute directly (i.e., computing-inmemory). These two approaches have their own advantages and disadvantages, as well as suitable scenarios. In this survey, the birth and development of processing-in-memory is firstly introduced and discussed. Its techniques, ranging from hardware to microarchitecture, are then presented. Furthermore, the challenges faced by processingin-memory are analyzed. Finally, the opportunities that processing-in-memory offers for popular applications are discussed.