Network-on-Chip (NoC) is an emerging paradigm that is able to connect a significant amount of processing elements (PEs). However, as a distributed sub-system, NoC resources have not been exploited to the fullest. Multipath parallel transmission, which splits one message into multiple parts and sends them simultaneously, shows its efficiency in utilizing NoC resources and further reducing the transmission latency. However, this method is not fully optimized in previous works, especially for emerging point-to-point NoCs due to the following reasons: (i) only limited shortest paths are chosen; (ii) static message splitting strategy without considering NoC utilization state increases contentions;(iii) the optimization of hardware that supports multipath parallel transmission is missing, resulting in additional overheads. Thus, we propose LAMP, a software and hardware collaborated design to efficiently utilize resources and reduce latency in point-to-point NoCs through the load-balanced multipath parallel transmission. Specifically, we propose a reinforcement learning-based algorithm to decide when and how to split messages, and which path should be used according to traffic loads. Also, the temporal and spatial load-balancing algorithms are proposed so that the message size is adjusted properly to utilize NoC resources. Moreover, we revise the hardware design to support multipath parallel transmission efficiently. Extensive experiments show that our algorithm achieves a remarkable performance improvement (+18.0% ∼ +29.9%) when compared with the state-of-the-art dual-path algorithm. Our hardware design decreases power and area consumption by 23.2% and 10.3% over the dual-path hardware.