For massive machine-type communications, centralized control may incur a prohibitively high overhead. Grant-free non-orthogonal multiple access (NOMA) provides possible solutions, yet poses new challenges for efficient receiver design. In this paper, we develop a joint user identification, channel estimation, and signal detection (JUICESD) algorithm. Specifically, we divide the whole detection scheme into linear and non-linear modules. Then we handle the linear module by leveraging the existing approximate message passing (AMP) algorithms, and deal with the non-linear module based on generalized messaging passing. The exact calculation of the messages exchanged within the non-linear module and between the two modules is complicated due to phase ambiguity issues. By noticing that the messages under phase ambiguity exhibit a rotational invariance property, we propose a rotationally invariant Gaussian mixture (RiGm) model, and develop an efficient JUICESD-RiGm algorithm. JUICESD-RiGm achieves a performance close to JUICESD with a much lower complexity.Capitalizing on the feature of RiGm, we further analyze the performance of JUICESD-RiGm with state evolution techniques. Numerical results demonstrate that the proposed algorithms achieve a significant performance improvement over the existing alternatives, and even outperform oracle linear minimum mean square error (LMMSE) receivers; and the derived state evolution method predicts the system performance accurately. be accurately characterized by the state evolution [24]. Furthermore, sparse signal recovery algorithms for more general system models have been developed recently, including turbo compressed sensing [16], orthogonal AMP (OAMP) [17] and vector AMP (VAMP) [18] for linear systems with a non-i.i.d. sensing matrix, generalized AMP (GAMP) [20], [21] for systems with non-linear output, and bilinear GAMP (BiGAMP) [22], [23] for bi-linear systems. These message passing based algorithms provide the current state of the art for sparse signal reconstruction. Based on aforementioned algorithms, joint designs of channel estimation, user identification, and/or signal detection have been pursued to improve the system performance. Specifically, under the assumption of perfect channel state information (CSI) at the receiver (CSIR), joint active user identification and signal detection algorithms were developed in [25], [26]. For systems without CSIR, [27]-[30] established joint channel estimation and active user identification algorithms, followed by separated signal detection operations. In addition, joint channel and data estimation algorithms were developed for massive MIMO systems [31]-[34] and for single carrier systems [35]. Recently, [36] proposed a joint channel estimation and multiuser detection algorithm, named block sparsity adaptive subspace pursuit (BSASP). This algorithm transfers the single-measurementvector compressive sensing (SMV-CS) problem to multiple-measurement-vector compressive sensing (MMV-CS), and reconstructs the sparse signal by exploiting the in...