Noncoherent demodulation is an attractive choice for many wireless communication systems. It requires minimal protocol overhead for carrier synchronization, and it is robust to radio impairments commonly found in low-cost transceivers. Machine learning techniques, such as neural networks and deep learning, offer additional benefits for these systems. Practical communication systems often include nonlinearities, non-stationarity, and non-Gaussian noise, which complicate mathematical derivation of optimum demodulators. Learning approaches can optimize demodulator performance directly from simulated or measured radio data, which is often plentiful in the design and verification of today's integrated transceivers. This paper examines several candidate neural network topologies for use in noncoherent demodulation and provides a mathematical framework for their comparison. Each is based on a complexvalued feature detection layer, which may be characterized as coherent or noncoherent, followed by one or more real-valued classification layers. Backpropagation equations for the noncoherent feature layer include a synchronization term that facilitates training with noncoherent input data. The coherent layer does not synchronize training data, however a noncoherent demodulator can still be constructed by increasing the coherent layer capacity and adding a max pooling layer to marginalize the unknown signal phase. A frequency classification example highlights the differences between the topologies and confirms that optimum noncoherent demodulation can be learned in the presence of AWGN and random phase offsets. The topologies considered here are suitable for noncoherent demodulation of power-efficient modulations such as FSK and ASK, which are typical in today's short-range wireless communication systems. It is hoped that such topologies will lead to a future common architecture that can support the wide range of modulation formats in this space.