We introduce a truly online anomaly detection algorithm that sequentially processes data to detect anomalies in time series. In anomaly detection, while the anomalous data are arbitrary, the normal data have similarities and generally conforms to a particular model. However, the particular model that generates the normal data is generally unknown (even nonstationary) and needs to be learned sequentially. Therefore, a two stage approach is needed, where in the first stage, we construct a probability density function to model the normal data in the time series. Then, in the second stage, we threshold the density estimation of the newly observed data to detect anomalies. We approach this problem from an information theoretic perspective and propose minimax optimal schemes for both stages to create an optimal anomaly detection algorithm in a strong deterministic sense. To this end, for the first stage, we introduce a completely online density estimation algorithm that is minimax optimal with respect to the log-loss and achieves Merhav's lower bound for general nonstationary exponential-family of distributions without any assumptions on the observation sequence. For the second stage, we propose a threshold selection scheme that is minimax optimal (with logarithmic performance bounds) against the best threshold chosen in hindsight with respect to the surrogate logistic loss. Apart from the regret bounds, through synthetic and real life experiments, we demonstrate substantial performance gains with respect to the state-of-the-art density estimation based anomaly detection algorithms in the literature.