We study the problem of covert online decision making in which an agent attempts to identify a parameter governing a system by probing the system while escaping detection from an adversary. The system is modeled as Markov kernel whose input is controlled by the agent and whose two outputs are observed by the agent and the adversary, respectively. This problem is motivated by applications such as covert sensing or covert radar, in which one tries to perform a sensing task without arousing suspicion by an adversary monitoring the environment for the presence of sensing signals. Specifically, we consider two situations corresponding to different amounts of knowledge of the system. If the kernel is known but governed by an unknown fixed parameter, we formulate the problem as a sequential hypothesis testing problem. If the kernel determining the observations of the agent is unknown but the kernel determining those of the adversary is known, we formulate the problem as a best arm identification problem in a bandit setting. In both situations, we characterize the exponent of the probability of identification error. As expected because of the covertness requirement, the probability of identification error decays exponentially with the square-root of the blocklength.