Due to advancements in multi-core design technology, IC (Integrated Circuits) designers have expanded the single chip multi-core design. A privileged way of communication effectively between these multi-cores is a Network on-chip (NoC). Design of an effective routing algorithm capable of routing data to non-congested paths is the most notable research challenge in NoC, by retrieving congestion information of non-local nodes. This research proposed an improved congestion-aware load balancing routing algorithm. Non-local or distant links congestion awareness is done by propagating congestion information via data packets. By counting number of hops from the source node, in the quadrant of the destination node, an intermediate node has been defined, and after the calculation of the least congested route to the intermediate node, this route is also stored in the data packet for source routing. Furthermore, for load balancing network is partitioned into two areas called high congested area (HCA) and low congested area (LCA). For load balancing, from HCA a node in LCA is selected as output for data packets. Comparison of the proposed algorithm is done in the form of average latency, average throughput, power consumption, and scalability analysis under synthetic traffic patterns. Under simulation experiments, it is shown improvement in an average latency and throughput of the proposed algorithm is 31.28% and 5.28% respectively, than existing.