The sequence alignment problem is one of the most fundamental problems in bioinformatics and a plethora of methods were devised to tackle it. Here we introduce BetaAlign, a novel methodology for aligning sequences using a natural language processing (NLP) approach. BetaAlign accounts for the possible variability of the evolutionary process among different datasets by using an ensemble of transformers, each trained on millions of samples generated from a different evolutionary model. Our approach leads to outstanding alignment accuracy, often outperforming commonly used methods, such as MAFFT, DIALIGN, ClustalW, T-Coffee, and MUSCLE. Notably, the utilization of deep-learning techniques for the sequence alignment problem brings additional advantages, such as automatic feature extraction that can be leveraged for a variety of downstream analysis tasks.
NLP research in Hebrew has largely focused on morphology and syntax, where rich annotated datasets in the spirit of Universal Dependencies are available. Semantic datasets, however, are in short supply, hindering crucial advances in the development of NLP technology in Hebrew. In this work, we present PARASHOOT, the first question answering dataset in modern Hebrew. The dataset follows the format and crowdsourcing methodology of SQuAD, and contains approximately 3000 annotated examples, similar to other questionanswering datasets in low-resource languages. We provide the first baseline results using recently-released BERT-style models for Hebrew, showing that there is significant room for improvement on this task.
Feature search for a light bar with one orientation (or color) embedded in an array of bars with a very different orientation (or color) is quick, easy and independent of the number of array elements. In contrast, search for a conjunction target has a linear response time dependence on the number of distractors. Training can improve performance of both these tasks. We report that these properties may not be valid for eccentric stimulus presentation. In general, the two hemifields are not equally suited to search, and training is most effective in the weaker hemifield. In addition, the feature-search independence of set-size may not always be valid for stimulus arrays that are presented peripherally. Subjects were tested on orientation and color feature tasks, and on orientation-color conjunction search with 3 array sizes presented at fixation or eccentrically in the right or left hemifield. During a second testing session, improvement was so much greater for the non-preferred hemifield that sometimes the preference was switched. Surprisingly, preferred hemifield performance actually declined for some subjects. Thus, the hemifield preference effect seems related to competition, and perhaps an automatic attention-directing mechanism. We confirmed the central presentation set-size independence for feature search but found a great difference between large and small arrays when presentation was lateral. There are two sources of this array size effect: 1. Target eccentricity, demonstrated by comparing performance for different target locations with the same array size. 2. Target location uncertainty, seen by comparing performance for different size arrays when the target elements appeared at the same locations. Training also affected the array-size dependence, changing search performance from set-size dependent to independent or vice versa at the point of greatest training effect.
One of the most challenging problems associated with the development of accurate and reliable application of computer vision and artificial intelligence in agriculture is that, not only are massive amounts of training data usually required, but also, in most cases, the images have to be properly labeled before models can be trained. Such a labeling process tends to be time consuming, tiresome, and expensive, often making the creation of large labeled datasets impractical. This problem is largely associated with the many steps involved in the labeling process, requiring the human expert rater to perform different cognitive and motor tasks in order to correctly label each image, thus diverting brain resources that should be focused on pattern recognition itself. One possible way to tackle this challenge is by exploring the phenomena in which highly trained experts can almost reflexively recognize and accurately classify objects of interest in a fraction of a second. As techniques for recording and decoding brain activity have evolved, it has become possible to directly tap into this ability and to accurately assess the expert’s level of confidence and attention during the process. As a result, the labeling time can be reduced dramatically while effectively incorporating the expert’s knowledge into artificial intelligence models. This study investigates how the use of electroencephalograms from plant pathology experts can improve the accuracy and robustness of image-based artificial intelligence models dedicated to plant disease recognition. Experiments have demonstrated the viability of the approach, with accuracies improving from 96% with the baseline model to 99% using brain generated labels and active learning approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.