Aim/Purpose: In this study, the research proposes and experiments with a new model of collecting, storing, and analyzing big data on customer feedback in the tourism industry. The research focused on the Vietnam market.
Background: Big Data describes large databases that have been “silently” built by businesses, which include product information, customer information, customer feedback, etc. This information is valuable, and the volume increases rapidly over time, but businesses often pay little attention or store it discretely, not centrally, thereby wasting an extremely large resource and partly causing limitations for business analysis as well as data.
Methodology: The study conducted an experiment by collecting customer feedback data in the field of tourism, especially tourism in Vietnam, from 2007 to 2022. After that, the research proceeded to store and mine latent topics based on the data collected using the Topic Model. The study applied cloud computing technology to build a collection and storage model to solve difficulties, including scalability, system stability, and system cost optimization, as well as ease of access to technology.
Contribution: The research has four main contributions: (1) Building a model for Big Data collection, storage, and analysis; (2) Experimenting with the solution by collecting customer feedback data from huge platforms such as Booking.com, Agoda.com, and Phuot.vn based on cloud computing, focusing mainly on tourism Vietnam; (3) A Data Lake that stores customer feedback and discussion in the field of tourism was built, supporting researchers in the field of natural language processing; (4) Experimental research on the latent topic mining model from the collected Big Data based on the topic model.
Findings: Experimental results show that the Data Lake has helped users easily extract information, thereby supporting administrators in making quick and timely decisions. Next, PySpark big data processing technology and cloud computing help speed up processing, save costs, and make model building easier when moving to SaaS. Finally, the topic model helps identify customer discussion trends and identify latent topics that customers are interested in so business owners have a better picture of their potential customers and business.
Recommendations for Practitioners: Empirical results show that facilities are the factor that customers in the Vietnamese market complain about the most in the tourism/hospitality sector. This information also recommends that practitioners reduce their expectations about facilities because the overall level of physical facilities in the Vietnamese market is still weak and cannot be compared with other countries in the world. However, this is also information to support administrators in planning to upgrade facilities in the long term.
Recommendation for Researchers: The value of Data Lake has been proven by research. The study also formed a model for big data collection, storage, and analysis. Researchers can use the same model for other fields or use the model and algorithm proposed by this study to collect and store big data in other platforms and areas.
Impact on Society: Collecting, storing, and analyzing big data in the tourism sector helps government strategists to identify tourism trends and communication crises. Based on that information, government managers will be able to make decisions and strategies to develop regional tourism, propose price levels, and support innovative programs. That is the great social value that this research brings.
Future Research: With each different platform or website, the study had to build a query scenario and choose a different technology approach, which limits the ability of the solution’s scalability to multiple platforms. Research will continue to build and standardize query scenarios and processing technologies to make scalability to other platforms easier.