Real-time multilingual phrase detection from/during online video presentations—to support instant remote diagnostics—requires near real-time visual (textual) object detection and preprocessing for further analysis. Connecting remote specialists and sharing specific ideas is most effective using the native language. The main objective of this paper is to analyze and propose—through DEtection TRansformer (DETR) models, architectures, hyperparameters—recommendation, and specific procedures with simplified methods to achieve reasonable accuracy to support real-time textual object detection for further analysis. The development of real-time video conference translation based on artificial intelligence supported solutions has a relevant impact in the health sector, especially on clinical practice via better video consultation (VC) or remote diagnosis. The importance of this development was augmented by the COVID-19 pandemic. The challenge of this topic is connected to the variety of languages and dialects that the involved specialists speak and that usually needs human translator proxies which can be substituted by AI-enabled technological pipelines. The sensitivity of visual textual element localization is directly connected to complexity, quality, and the variety of collected training data sets. In this research, we investigated the DETR model with several variations. The research highlights the differences of the most prominent real-time object detectors: YOLO4, DETR, and Detectron2, and brings AI-based novelty to collaborative solutions combined with OCR. The performance of the procedures was evaluated through two research phases: a 248/512 (Phase1/Phase2) record train data set, with a 55/110 set of validated data instances for 7/10 application categories and 3/3 object categories, using the same object categories for annotation. The achieved score breaks the expected values in terms of visual text detection scope, giving high detection accuracy of textual data, the mean average precision ranging from 0.4 to 0.65.
Background: One of the most critical topics in sports safety today is the reduction in injury risks through controlled fatigue using non-invasive athlete monitoring. Due to the risk of injuries, it is prohibited to use accelerometer-based smart trackers, activity measurement bracelets, and smart watches for recording health parameters during performance sports activities. This study analyzes the synergy feasibility of medical radar sensors and tri-axial acceleration sensor data to predict physical activity key performance indexes in performance sports by using machine learning (ML). The novelty of this method is that it uses a 24 GHz Doppler radar sensor to detect vital signs such as the heartbeat and breathing without touching the person and to predict the intensity of physical activity, combined with the acceleration data from 3D accelerometers. Methods: This study is based on the data collected from professional athletes and freely available datasets created for research purposes. A combination of sensor data management was used: a medical radar sensor with no-contact remote sensing to measure the heart rate (HR) and 3D acceleration to measure the velocity of the activity. Various advanced ML methods and models were employed on the top of sensors to analyze the vital parameters and predict the health activity key performance indexes. three-axial acceleration, heart rate data, age, as well as activity level variances. Results: The ML models recognized the physical activity intensity and estimated the energy expenditure on a realistic level. Leave-one-out (LOO) cross-validation (CV), as well as out-of-sample testing (OST) methods, have been used to evaluate the level of accuracy in activity intensity prediction. The energy expenditure prediction with three-axial accelerometer sensors by using linear regression provided 97–99% accuracy on selected sports (cycling, running, and soccer). The ML-based RPE results using medical radar sensors on a time-series heart rate (HR) dataset varied between 90 and 96% accuracy. The expected level of accuracy was examined with different models. The average accuracy for all the models (RPE and METs) and setups was higher than 90%. Conclusions: The ML models that classify the rating of the perceived exertion and the metabolic equivalent of tasks perform consistently.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.