Purpose: To present our method and experience in commissioning dose models in water for spot scanning proton therapy in a commercial treatment planning system (TPS). Methods: The input data required by the TPS included in-air transverse profiles and integral depth doses (IDDs). All input data were obtained from Monte Carlo (MC) simulations that had been validated by measurements. MC-generated IDDs were converted to units of Gy mm 2 /MU using the measured IDDs at a depth of 2 cm employing the largest commercially available parallel-plate ionization chamber. The sensitive area of the chamber was insufficient to fully encompass the entire lateral dose deposited at depth by a pencil beam (spot). To correct for the detector size, correction factors as a function of proton energy were defined and determined using MC. The fluence of individual spots was initially modeled as a single Gaussian (SG) function and later as a double Gaussian (DG) function. The DG fluence model was introduced to account for the spot fluence due to contributions of large angle scattering from the devices within the scanning nozzle, especially from the spot profile monitor. To validate the DG fluence model, we compared calculations and measurements, including doses at the center of spread out Bragg peaks (SOBPs) as a function of nominal field size, range, and SOBP width, lateral dose profiles, and depth doses for different widths of SOBP. Dose models were validated extensively with patient treatment field-specific measurements. Results: We demonstrated that the DG fluence model is necessary for predicting the field size dependence of dose distributions. With this model, the calculated doses at the center of SOBPs as a function of nominal field size, range, and SOBP width, lateral dose profiles and depth doses for rectangular target volumes agreed well with respective measured values. With the DG fluence model for our scanning proton beam line, we successfully treated more than 500 patients from March 2010 through June 2012 with acceptable agreement between TPS calculated and measured dose distributions. However, the current dose model still has limitations in predicting field size dependence of doses at some intermediate depths of proton beams with high energies. Conclusions: We have commissioned a DG fluence model for clinical use. It is demonstrated that the DG fluence model is significantly more accurate than the SG fluence model. However, some deficiencies in modeling the low-dose envelope in the current dose algorithm still exist. Further improvements to the current dose algorithm are needed. The method presented here should be useful for commissioning pencil beam dose algorithms in new versions of TPS in the future.