In this paper, we study the access technology and cluster analysis of review data in the era of big data. We combine the nutch web crawler and Hadoop distributed to crawl evaluation data through distribution and solve the problem of slow execution of a single machine; After some filtering and extracting feature words, the TF-IDF calculation method is used to calculate the weight of feature words to facilitate the construction of text vectorization representation. Finally, the similarity between statements is calculated based on the VSM method. Then, the distributed operation of the canopy algorithm and K-means algorithm is realized by combining the map-reduce framework, which greatly speeds up the efficiency and accuracy of clustering. Finally, taking the comment data of a brand water purifier as an example, this paper crawls out the comment data of the product from the e-commerce platform and carries out cluster analysis. After integrating the statistics, it can be seen from the figure that the processing time gradually increases with the increase in the number of comments. And increases in the form of a power function. Compared with 6217 comments, the processing time of 10858 comments increased by about 13%. 21083 comments increased by 36%. 31947 comments increased by 61%. 52944 comments increased by 96% 83168 comments increased by 145%.