Ultra dense networks (UDN) are identified as one of the key enablers for 5G, since they can provide an ultra high spectral reuse factor exploiting proximal transmissions. By densifying the network infrastructure equipment, it is highly possible that each user will have one or more dedicated serving base station antennas, introducing the user-centric virtual cell paradigm. However, due to irregular deployment of a large amount of base station antennas, the interference environment becomes rather complex, thus introducing severe interferences among different virtual cells. This paper focuses on the downlink transmission scheme in UDN where a large number of users and base station antennas is uniformly spread over a certain area. An interference graph is first created based on the large-scale fadings to give a potential description of the interference relationship among the virtual cells. Then, base station antennas and users in the virtual cells within the same maximally-connected component are grouped together and merged into one new virtual cell cluster, where users are jointly served via zero-forcing (ZF) beamforming. A multi-virtual-cell minimum mean square error precoding scheme is further proposed to mitigate the inter-cluster interference. Additionally, the interference alignment framework is proposed based on the low complexity virtual cell merging to eliminate the strong interference between different virtual cells. Simulation results show that the proposed interference graph-based virtual cell merging approach can attain the average user spectral efficiency performance of the grouping scheme based on virtual cell overlapping with a smaller virtual cell size and reduced signal processing complexity. Besides, the proposed user-centric transmission scheme greatly outperforms the BS-centric transmission scheme (maximum ratio transmission (MRT)) in terms of both the average user spectral efficiency and edge user spectral efficiency. What is more, interference alignment based on the low complexity virtual cell merging can achieve much better performance than ZF and MRT precoding in terms of average user spectral efficiency.