Nowadays, autonomous taxis become a highly promising transportation mode, which helps relieve traffic congestion and avoid road accidents. However, it hinders the wide implementation of this service that traditional models fail to efficiently allocate the available fleet to deal with the imbalance of supply (autonomous taxis) and demand (trips), the poor cooperation of taxis, hardly satisfied resource constraints, and on-line platform's requirements. To figure out such urgent problems from a global and more farsighted view, we employ a Constrained Multi-agent Markov Decision Processes (CMMDP) to model fleet allocation decisions, which can be easily split into sub-problems formulated as a Dynamic assignment problem combining both immediate rewards and future gains. We also leverage a Column Generation algorithm to guarantee the efficiency and optimality in a large scale. Through extensive experiments, the proposed approach not only achieves remarkable improvements over the state-of-theart benchmarks in terms of the individuals efficiency (arriving at 12.40%, 6.54% rise of income and utilization, respectively) and the platforms profit (reaching 4.59% promotion) but also reveals a time-varying fleet adjustment policy to minimize the operation cost of the platform.