Multiple features can be extracted from time-frequency representation (TFR) of signals for the purpose of acoustic event detection. However, many underwater acoustic signals are formed by multiple events (impulsive and tonal), which generates difficulty on the high-resolution TFR for each component. For the characterization of such different events, we propose an anisotropic chirplet transform to achieve the TFR with high energy concentration. Such transform applies a time-frequencyvarying Gaussian window to compensate the energy of each component while suppressing unwanted noise. Using a set of directional chirplet ridges from the obtained TFR, a structure-split-merge algorithm is designed to reconstruct a multimodal sparse representation, which provides instantaneous frequency and time features. Specifically, a pulsed-to-tonal ratio, based on these features, is computed to distinguish two acoustic signals. The presented method is validated using shallow water experimental underwater acoustic communication signals, and large sequences of harmonics and pulsed bursts from common whales. Index Terms Anisotropic chirplet transform (ACT), multimodal sparse representation (MSR), pulsed-to-tonal ratio (PTR), time-frequency representation (TFR), underwater acoustic (UWA) signals.