In the clock network design, the trade-off between power consumption and timing closure is an important and difficult issue. The clock tree architecture has a shorter wire length and better power consumption, but it is more difficult to achieve timing closure with it. On the other hand, clock mesh architecture is easier to satisfy the clock skew constraint, but it usually has much more power consumption. Therefore, a hybrid clock network architecture that combines both the clock tree and clock mesh seems to be a promising solution. In a normal hybrid mesh/tree structure, a driving buffer is placed in the intersection of mesh lines. In this paper, we propose a novel cross-mesh architecture, and we distribute the buffers to balance the overall switching capacitance, reducing the number of registers connected to a subtree, and the load capacitance of a buffer. With the average dispersion of the overall driving force, our methodology creates small non-zero skew clock trees. In addition, we integrate clock gating, register clustering, and load balancing techniques to optimize clock skew and load capacitance simultaneously. The proposed methodology has four stages: cross-mesh planning, register clustering, mesh line connecting, and load balancing. Experimental results show that our cross-mesh architecture has high tolerance for process variation, and is robust in all the operation modes. Comparing it to the uniform mesh architecture, our methodology and algorithms reduce 28.9% of load capacitance and 80.4% of clock skew on average. Compared to the non-uniform mesh architecture, we also reduce capacitance by 22.4% and skew by 76.7% on average. This illustrates that we can obtain a feasible solution effectively and improve both power consumption and clock skew simultaneously.