Today’s cities generate tremendous amounts of data, thanks to a boom in affordable smart devices and sensors. The resulting big data creates opportunities to develop diverse sets of context-aware services and systems, ensuring smart city services are optimized to the dynamic city environment. Critical resources in these smart cities will be more rapidly deployed to regions in need, and those regions predicted to have an imminent or prospective need. For example, crime data analytics may be used to optimize the distribution of police, medical, and emergency services. However, as smart city services become dependent on data, they also become susceptible to disruptions in data streams, such as data loss due to signal quality reduction or due to power loss during data collection. This paper presents a dynamic network model for improving service resilience to data loss. The network model identifies statistically significant shared temporal trends across multivariate spatiotemporal data streams and utilizes these trends to improve data prediction performance in the case of data loss. Dynamics also allow the system to respond to changes in the data streams such as the loss or addition of new information flows. The network model is demonstrated by city-based crime rates reported in Montgomery County, MD, USA. A resilient network is developed utilizing shared temporal trends between cities to provide improved crime rate prediction and robustness to data loss, compared with the use of single city-based auto-regression. A maximum improvement in performance of 7.8% for Silver Spring is found and an average improvement of 5.6% among cities with high crime rates. The model also correctly identifies all the optimal network connections, according to prediction error minimization. City-to-city distance is designated as a predictor of shared temporal trends in crime and weather is shown to be a strong predictor of crime in Montgomery County.
Named entity recognition (NER) is a key component of many scientific literature mining tasks, such as information retrieval, information extraction, and question answering; however, many modern approaches require large amounts of labeled training data in order to be effective. This severely limits the effectiveness of NER models in applications where expert annotations are difficult and expensive to obtain. In this work, we explore the effectiveness of transfer learning and semi-supervised self-training to improve the performance of NER models in biomedical settings with very limited labeled data (250-2000 labeled samples). We first pre-train a BiLSTM-CRF and a BERT model on a very large general biomedical NER corpus such as MedMentions or Semantic Medline, and then we fine-tune the model on a more specific target NER task that has very limited training data; finally, we apply semi-supervised self-training using unlabeled data to further boost model performance. We show that in NER tasks that focus on common biomedical entity types such as those in the Unified Medical Language System (UMLS), combining transfer learning with self-training enables a NER model such as a BiLSTM-CRF or BERT to obtain similar performance with the same model trained on 3x-8x the amount of labeled data. We further show that our approach can also boost performance in a low-resource application where entities types are more rare and not specifically covered in UMLS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.