The entire reason that we wrote this paper was to provide a concrete object around which to focus a broader discussion about prior choice and we are extremely grateful to the editorial team at Statistical Science for this opportunity. David Dunson (DD), Jim Hodges (JH), Christian Robert, Judith Rousseau (RR) and James Scott (JS) have taken this discussion in diverse and challenging directions and over the next few pages, we will try to respond to the main points they have raised.
"IF I COULD LOVE, I WOULD LOVE YOU ALL."-KIKI DURANEThe point of departure for our paper is that most modern statistical models are built to be flexible enough to model diverse data generating mechanisms. Good statistical practice requires us to limit this flexibility, which is typically controlled by a small number of parameters, to the amount "needed" to model the data at hand. The Bayesian framework provides a natural method for doing this although, as DD points out, this trend for penalising model complexity casts a broad shadow over all of modern statistics and data science.The PC prior framework argues for setting priors on these flexibility parameters that are specifically built to penalise a certain type of complexity and avoid overfitting. The discussants raised various points about this core idea. First, DD pointed out that while over-fitting a model is a bad thing, under-fitting is not better: we do not want Occam's razor to slit our throat. We saw this behaviour when using a half-Normal prior on the distance, while the exponential prior does not lead to obvious attenuation of the estimates. This is confirmed experimentally by Klein and Kneib (2016).Both DD and RR note our focus on a specific parameterisation and DD (as well as a large number of reviewers) note that our informal definition of overfitting is parameterisation dependent. We did this on purpose: most people who use complex statistical models do not understand prior mass conditions in terms of Kullback-Leibler balls and the theoretical results in the paper do not justify this level of mathematical sophistication. Our choice to sacrifice generality (and annoy reviewers) in the search for a clear exposition has lead us to a revelation: we can replace questions about prior choice with questions about parameterisation. This leads us to re-phrase DD's implied question: How should we parameterise a flexibility parameter so that we can use an exponential prior?The parameterisation we chose waswhere ξ is the original flexibility parameter indexing model f ξ and f 0 is the base model. JH correctly tweaks our nose over our inability to communicate this distance in a meaningful way (a heinous sin for people who abandoned measure theory in a quest for clarity). While we personally find our interpretationd(ξ ) is the amount of information you lose by abandoning the flexible component in favour of the base model-appealing, it is a bit dry and abstract. JH suggests communicating the distance by considering how much a coin would be weighted to achieve that distance from a fair co...