In order to achieve safe and high-quality decisionmaking and motion planning, autonomous vehicles should be able to generate accurate probabilistic predictions for uncertain behavior of other road users. Moreover, reactive predictions are necessary in highly interactive driving scenarios to answer "what if I take this action in the future" for autonomous vehicles. Many recently proposed methods based on probabilistic graphical models (PGM), neural networks (NN) and inverse reinforcement learning (IRL) have great potential to solve the problem. However, there is no existing unified framework to homogenize the problem formulation, representation simplification, and evaluation metric for those methods. In this paper, we formulate a probabilistic reaction prediction problem, and reveal the relationship between reaction and situation prediction problems. We employ prototype trajectories with designated motion patterns other than "intention" to homogenize the representation so that probabilities corresponding to each trajectory generated by different methods can be evaluated. We also discuss the reasons why "intention" is not suitable to serve as a motion indicator in highly interactive scenarios. We propose to use Brier score as the baseline metric for evaluation. In order to reveal the fatality of the consequences when the predictions are adopted by decision-making and planning, we propose a fatality-aware metric, which is a weighted Brier score based on the criticality of the trajectory pairs of the interacting entities. Conservatism and non-defensiveness are defined from the weighted Brier score to indicate the consequences caused by inaccurate predictions. Modified methods based on PGM, NN and IRL are provided to generate probabilistic reaction predictions in an exemplar scenario of nudging from a highway ramp. The results are evaluated by the baseline and proposed metrics to construct a mini benchmark. Analysis on the properties of each method is also provided by comparing the baseline and proposed metric scores.