In recent years, our understanding of gamma-ray bursts (GRB) prompt emission has been revolutionized, due to a combination of new instruments, new analysis methods and novel ideas. In this review, I describe the most recent observational results and the current theoretical interpretation. Observationally, a major development is the rise of time-resolved spectral analysis. These led to (I) identification of a distinguished high energy component, with GeV photons often seen at a delay; and (II) firm evidence for the existence of a photospheric (thermal) component in a large number of bursts. These results triggered many theoretical efforts aimed at understanding the physical conditions in the inner jet regions from which the prompt photons are emitted, as well as the spectral diversity observed. I highlight some areas of active theoretical research. These include: (I) understanding the role played by magnetic fields in shaping the dynamics of GRB outflow and spectra; (II) understanding the microphysics of kinetic and magnetic energy transfer, namely accelerating particle to high energies in both shock waves and magnetic reconnection layers; (III) understanding how sub-photospheric energy dissipation broadens the "Planck" spectrum; and (IV) geometrical light aberration effects. I highlight some of these efforts, and point towards gaps that still exist in our knowledge as well as promising directions for the future.Since this is a rapidly evolving field, one has to be extra careful in describing the spectra of GRB prompt emission. As I will show below, the observed spectra is, in fact sensitive to the analysis method chosen. Thus, before describing the spectra, one has to describe the analysis method.Typically, the spectral analysis is based on analyzing flux integrated over the entire duration of the prompt emission, namely the spectra is time-integrated. Clearly, this is a trade off, as enough photons need to be collected in order to analyze the spectra. For weak bursts this is the only thing one can do. However, there is a major drawback here: use of the time integrated spectra implies that important time-dependent signals could potentially be lost or at least smeared. This can easily lead to the wrong theoretical interpretation.A second point of caution is the analysis method, which is done by a forward folding technique. This means the following. First, a model spectrum is chosen. Second, the chosen model is convolved with the detector response, and compared to the detected counts spectrum. Third, the model parameters are varied in search for the minimal difference between model and data. The outcome is the best fitted parameters within the framework of the chosen model. This analysis method is the only one that can be used, due to the non-linearity of the detector's response matrix, which makes it impossible to invert.However, the need to pre-determine the fitted model implies that the results are biased by the initial hypothesis. Two different models can fit the data equally well. This fact, which is often bei...