Recent years have demonstrated the feasibility of using intracortical Brain-Machine Interfaces (iBMIs), by decoding thoughts, for communication and cursor control tasks. iBMIs are increasingly becoming wireless due to the risk of infection and mechanical failure, typically associated with percutaneous connections. The wireless communication itself, however, increases the power consumption further; with the total dissipation being strictly limited due to safety heating limits of cortical tissue. Since wireless power is typically proportional to the communication bandwidth, the output Bit Rate (BR) must be minimised. Whilst most iBMIs utilise Multi-Unit activity (MUA), i.e. spike events, and this in itself significantly reduces the output BR (compared to raw data), it still limits the scalability (number of channels) that can be achieved. As such, additional compression for MUA signals are essential for fully-implantable, high-information-bandwidth systems. To meet this need, this work proposes various hardware-efficient, ultra-low power MUA compression schemes. We investigate them in terms of their BRs and hardware requirements as a function of various on-implant conditions such as MUA Binning Period (BP) and number of channels. It was found that for BPs ≤ 10 ms, the delta-asynchronous method had the lowest total power and reduced the BR by almost an order of magnitude relative to classical methods (e.g. to approx. 151 bps/channel for a BP of 1 ms and 1000 channels on-implant.). However, at larger BPs the synchronous method performed best (e.g. approx. 29 bps/channel for a BP of 50 ms, independent of channel count). As such, this work can guide the choice of MUA data compression scheme for BMI applications, where the BR can be significantly reduced in hardware efficient ways. This enables the next generation of wireless iBMIs, with small implant sizes, high channel counts, low-power, and small hardware footprint. All code and results have been made publicly available.