Neural networks trained on datasets such as ImageNet have led to major advances in visual object classification. One obstacle that prevents networks from reasoning more deeply about complex scenes and situations, and from integrating visual knowledge with natural language, like humans do, is their lack of common sense knowledge about the physical world. Videos, unlike still images, contain a wealth of detailed information about the physical world. However, most labelled video datasets represent high-level concepts rather than detailed physical aspects about actions and scenes. In this work, we describe our ongoing collection of the "something-something" database of video prediction tasks whose solutions require a common sense understanding of the depicted situation. The database currently contains more than 100,000 videos across 174 classes, which are defined as caption-templates. We also describe the challenges in crowd-sourcing this data at scale.
We estimate the stiffness of single-walled carbon nanotubes by observing their freestanding roomtemperature vibrations in a transmission electron microscope. The nanotube dimensions and vibration amplitude are measured from electron micrographs, and it is assumed that the vibration modes are driven stochastically and are those of a clamped cantilever. Micrographs of 27 nanotubes in the diameter range 1.0-1.5 nm were measured to yield an average Young's modulus of ͗Y ͘ϭ1.25 TPa. This value is consistent with previous measurements for multiwalled nanotubes, and is higher than the currently accepted value of the in-plane modulus of graphite. ͓S0163-1829͑98͒00144-1͔
This paper presents the theory, design principles, implementation and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system. In addition, this document presents the rationale, design and results of psychophysical experiments that were conducted to address some key issues that arose during PicHunter's development. The PicHunter project makes four primary contributions to research on CBIR. First, PicHunter represents a simple instance of a general Bayesian framework which we describe for using relevance feedback to direct a search. With an explicit model of what users would do, given the target image they want, PicHunter uses Bayes's rule to predict the target they want, given their actions. This is done via a probability distribution over possible image targets, rather than by refining a query. Second, an entropy-minimizing display algorithm is described that attempts to maximize the information obtained from a user at each iteration of the search. Third, PicHunter makes use of hidden annotation rather than a possibly inaccurate/inconsistent annotation structure that the user must learn and make queries in. Finally, PicHunter introduces two experimental paradigms to quantitatively evaluate the performance of the system, and psychophysical experiments are presented that support the theoretical claims.
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with one fourth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.