This paper presents a multi-physics model of an electrostatic loudspeaker system that combines the acoustic diaphragm and the actuator into a lightweight dielectric elastomer (DE) membrane. The focus is set on the so-called cone-shaped DE actuator (DEA) topology, which features a self-standing compact architecture, free from pneumatic loading systems, and is potentially suitable for integration onto complex surfaces and structures. We propose an axial-symmetrical lumped-parameter nonlinear model of the cone DEA structural dynamics, and use it to predict the acoustic pressure field generated by the speaker. We then present a case study in which the model is used to predict the linearised mode shapes of a reference DEA, evaluate their effect on the acoustic frequency response, and compare the harmonic distortions resulting from different driving strategies.