“…However, when employed as distillation loss, the commonly used listwise losses can destroy the probabilistic meaning of the student model's predictions as pCTR, degrading the student model's calibration ability. For the CTR prediction task, the calibration ability, i.e., whether or not the predicted click probability aligns with the actual click-through rate, is another important factor for measuring the model performance [5,11,30,39,47,50,57]. For example, considering the cost-per-click (CPC) system in online advertising, a candidate advertisement (ad) is ranked and charged by the strategy of effective cost per mile impressions (eCPM), which is computed as eCPM = 1000 โข ๐๐ถ๐ ๐
โข Bid CPC , where Bid CPC is the bid price from the advertiser, and ๐๐ถ๐ ๐
denotes the predicted click-through rate (predicted click probability) of the ad, which the predictive model produces.…”