Summary This paper incorporates text data from MLS listings into a hedonic pricing model. We show that the comments section of the MLS, which is populated by real estate agents who arguably have the most local market knowledge and know what homebuyers value, provides information that improves the performance of both in‐sample and out‐of‐sample pricing estimates. Text is found to decrease pricing error by more than 25%. Information from text is incorporated into a linear model using a tokenization approach. By doing so, the implicit prices for various words and phrases are estimated. The estimation focuses on simultaneous variable selection and estimation for linear models in the presence of a large number of variables using a penalized regression. The LASSO procedure and variants are shown to outperform least‐squares in out‐of‐sample testing. Copyright © 2016 John Wiley & Sons, Ltd.
We show that conventional hedonic models for commercial real estate prices ignore the utility investors derive from a building's extreme attributes. Analyzing geo-enriched data on nearly 4,800 hotel transactions in the United States, we find that the relative positioning of an asset's attributes-particularly at the extremes-has a significant impact on transaction prices. We also detect separating equilibria for extreme attributes across the premium and discount hotel segments. Extreme attributes "stand out" and are value enhancing in premium hotel segments. In contrast, extreme attributes are value diminishing in the discount hotel segment. The relative degree to which the asset's attributes are extreme is important. Being a locally largest asset has a negative effect on price, however the negative effect is more than offset if the hotel is among the largest hotels nationally. The results suggest that locally extreme assets, unless also nationally extreme, are considered atypical and trade at a discount.
We provide a new framework for using text as data in empirical models. The framework identifies salient information in unstructured text that can control for multidimensional heterogeneity among assets. We demonstrate the efficacy of the framework by reexamining principal-agent problems in residential real estate markets. We show that the agent-owned premiums reported in the extant literature dissipate when the salient textual information is included. The results suggest the previously reported agent-owned premiums suffer from an omitted variable bias, which prior studies incorrectly ascribed to market distortions associated with asymmetric information. (JEL D82, G14, R00) Authors have furnished an Internet Appendix, which is available on the Oxford University Press Web site next to the link to the final published paper online.
We examine institutional investors’ entry into the equity side of the single‐family detached housing market using an asset illiquidity framework. We find that institutional investors purchased owner‐occupied houses after the real estate crisis for approximately 6.3–11.8% less than owner‐occupiers. The large discount was in addition to distressed sale and cash purchase discounts which, when combined, highlight the low liquidation value for owner‐occupied housing. The results suggest that asset illiquidity is an important cost of leverage in the owner‐occupied housing market.
The constant-quality assumption in repeat-sales house price indexes (HPIs) introduces a significant time-varying attribute bias. The direction, magnitude, and source of the bias varies throughout the market cycle and across metropolitan statistical areas (MSAs). We mitigate the bias using a data-driven textual analysis approach that identifies and includes salient text from real estate agent remarks in the repeat-sales estimation. Absent the text, MSA-level HPIs are biased downward by as much as 7 percent during the financial crisis and upward by as much as 20 percent after the crisis. The geographic concentration of the bias magnifies its effect on local HPIs. (JEL C43, E31, R11, R31)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.