As an indispensable use case for the 5G wireless systems on the roadmap, ultra-reliable and low latency communications (URLLC) is a crucial requirement for the coming era of wireless industrial automation. The key performance indicators for URLLC stand in sharp contrast to the requirements of enhanced mobile broadband (eMBB): lowlatency and ultra-reliability are paramount but high data rates are often not required. This paper aims to develop communication techniques for making a paradigm shift from the conventional human-type broadband communications to the emerging machine-type URLLC. One fundamental task for URLLC is to deliver short commands from a controller to a group of actuators within the stringent delay requirement and with high-reliability. Motivated by the factory automation setting in which tasks are assigned to groups of devices that work in close proximity to each other thus can form clusters of reliable device-to-device (D2D) networks, this paper proposes a novel two-phase transmission protocol for achieving URLLC. In the first phase, within the latency requirement, the multi-antenna base station (BS) combines the messages of all devices within each group together and multicasts them to the corresponding groups; messages for different groups are spatially multiplexed. In the second phase, the devices that have decoded the messages successfully, herein defined as the leaders, help relay the messages to the other devices in their groups. Under this protocol, we design an innovative leader selection based beamforming strategy at the BS by utilizing sparse optimization technique. The proposed strategy leads to a desired sparsity pattern in user activity with at least one leader being able to decode its message in each group in the first phase, thus ensuring full utilization of the reliability enhancing D2D transmissions in the second phase. Simulation results are provided to show that the proposed two-phase transmission protocol considerably improves the reliability of the entire system within the stringent latency requirement as compared to existing schemes for URLLC.