Tsunami fragility curves are statistical models which form a key component of tsunami risk models, as they provide a probabilistic link between a tsunami intensity measure (TIM) and building damage. Existing studies apply different TIMs (e.g. depth, velocity, force etc.) with conflicting recommendations of which to use. This paper presents a rigorous methodology using advanced statistical methods for the selection of the optimal TIM for fragility function derivation for any given dataset. This methodology is demonstrated using a unique, detailed, disaggregated damage dataset from the 2011 Great East Japan earthquake and tsunami (total 67,125 buildings), identifying the optimum TIM for describing observed damage for the case study locations. This paper first presents the proposed methodology, which is broken into three steps: (1) exploratory analysis, (2) statistical model selection and trend analysis and (3) comparison and selection of TIMs. The case study dataset is then presented, and the methodology is then applied to this dataset. In Step 1, exploratory analysis on the case study dataset suggests that fragility curves should be constructed for the sub-categories of engineered (RC and steel) and nonengineered (wood and masonry) construction materials. It is shown that the exclusion of buildings of unknown construction material (common practice in existing studies) may introduce bias in the results; hence, these buildings are estimated as engineered or nonengineered through use of multiple imputation (MI) techniques. In Step 2, a sensitivity analysis of several statistical methods for fragility curve derivation is conducted in order to select multiple statistical models with which to conduct further exploratory analysis and the TIM comparison (to draw conclusions which are non-model-specific). Methods of data aggregation and ordinary least squares parameter estimation (both used in existing studies) are rejected as they are quantitatively shown to reduce fragility curve accuracy and increase uncertainty. Partially ordered probit models and generalised additive models
123Nat Hazards (2016) 84:1257-1285 DOI 10.1007/s11069-016-2485 (GAMs) are selected for the TIM comparison of Step 3. In Step 3, fragility curves are then constructed for a number of TIMs, obtained from numerical simulation of the tsunami inundation of the 2011 GEJE. These fragility curves are compared using K-fold crossvalidation (KFCV), and it is found that for the case study dataset a force-based measure that considers different flow regimes (indicated by Froude number) proves the most efficient TIM. It is recommended that the methodology proposed in this paper be applied for defining future fragility functions based on optimum TIMs. With the introduction of several concepts novel to the field of fragility assessment (MI, GAMs, KFCV for model optimisation and comparison), this study has significant implications for the future generation of empirical and analytical fragility functions.