Background
Over the past years, homelessness has become a substantial issue around the globe. The largest social services organization in Thunder Bay, Ontario, Canada, has observed that a majority of the people experiencing homelessness in the city were from outside of the city or province. Thus, to improve programming and resource allocation for people experiencing homelessness in the city, including shelter use, it was important to investigate the trends associated with homelessness and migration.
Objective
This study aimed to address 3 research questions related to homelessness and migration in Thunder Bay: What factors predict whether a person who migrated to the city and is experiencing homelessness stays or leaves shelters? If an individual stays, how long are they likely to stay? What factors predict stay duration?
Methods
We collected the required data from 2 sources: a survey conducted with people experiencing homelessness at 3 homeless shelters in Thunder Bay and the database of a homeless information management system. The records of 110 migrants were used for the analysis. Two feature selection techniques were used to address the first and third research questions, and 8 machine learning models were used to address the second research question. In addition, data augmentation was performed to improve the size of the data set and to resolve the class imbalance problem. The area under the receiver operating characteristic curve value and cross-validation accuracy were used to measure the models’ performances while avoiding possible model overfitting.
Results
Factors predicting an individual’s stay duration included home or previous district, highest educational qualification, recent receipt of mental health support, migrating to visit family or friends, and finding employment upon arrival. For research question 2, among the classification models developed for predicting the stay duration of migrants, the random forest and gradient boosting tree models presented better results with area under the receiver operating characteristic curve values of 0.91 and 0.93, respectively. Finally, home district, band membership, status card, previous district, and recent support for drug and/or alcohol use were recognized as the factors predicting stay duration.
Conclusions
Applying machine learning enables researchers to make predictions related to migrants’ homelessness and investigate how various factors become determinants of the predictions. We hope that the findings of this study will aid future policy making and resource allocation to better serve people experiencing homelessness. However, further improvements in the data set size and interpretation of the identified factors in decision-making are required.