a b s t r a c tHabitat modifications induced by humans severely impact biotic components of freshwater ecosystems. In China, shallow lakes in the Yangtze River basin are facing severe habitat degradation induced by pollution, habitat losing, macrophytes disappearing and fishery activities. Effectively modeling the fish communities on the basis of biotic and abiotic environmental descriptors would be helpful to understand the relationships between fish and their environment, and to develop suitable conservation strategies to sustain the biodiversity in these ecosystems. From 2007 to 2009, investigations were carried out on fish and their environment in 6 lakes distributed in the mid-reach of the Yangtze River basin. According to the CPUE values of each fish species from each sampling, 117 datasets were ordinated using self-organizing map (SOM). Fish communities were classified into three clusters of species assemblages, spatial and temporal distributions were showing in it. Seasonal changes in fish community were more obvious in vegetated habitats than in unvegetated areas. The total CPUE, fish diversity and species richness were significantly different among the assemblages (p < 0.01). Based on the indicative value of each species in each cluster calculated by Indval method, 16 species were identified as indicators: 13 indicators in cluster G1 are pelagic or benthopelagic fish, the only one indicator species in G2 is a tolerant species (Culter dabry B.), while the other two indicator species in G3 are demersal fish (Rhinogobius giurinus R. and Odontobutis obscurus T. & G.). These results are in agreement with the contributions of different ecological groups of fish in each assemblage in the trained SOM, pelagic and benthopelagic fish were found having more activities in spring and winter, while more activities of demersal fish were found in summer and autumn. Fish community assemblages, the total fish CPUE, diversity and species richness in those lakes were then predicted by 15 abiotic and biotic factors using random forest (RF) and classification and regression tree (CART) predictive models. The predicted assignment of each site unit to the correct assemblage had an average success of 74.4% and 60.7% in RF and CART models, respectively. The dominant variables for discriminating three fish assemblages were water depth, distance to the bank and total phosphorus. While the two important variables in prediction fish CPUE, diversity and species richness were lake surface area and water depth, density of rotifer and water depth, water depth and water temperature, respectively. The overall percentages of successful prediction varied from 56.5% to 67% utilizing leave-one-out for cross-validation tests.