Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic measures is limited by the coverage of user studies, which do not scale with the size, heterogeneity, and growth of the Web. Here we propose to leverage human-generated metadata-namely topical directories-to measure semantic relationships among massive numbers of pairs of Web pages or topics. The Open Directory Project classifies millions of URLs in a topical ontology, providing a rich source from which semantic relationships between Web pages can be derived. While semantic similarity measures based on taxonomies (trees) are well studied, the design of well-founded similarity measures for objects stored in the nodes of arbitrary ontologies (graphs) is an open problem. This paper defines an information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology. An experimental study shows that this measure improves significantly on the traditional taxonomy-based approach. This novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good World Wide Web (2006) 9: 431-456
We argue that phishing IQ tests fail to measure susceptibility to phishing attacks. We conducted a study where 40 subjects were asked to answer a selection of questions from existing phishing IQ tests in which we varied the portion (from 25% to 100%) of the questions that corresponded to phishing emails. We did not find any correlation between the actual number of phishing emails and the number of emails that the subjects indicated were phishing. Therefore, the tests did not measure the ability of the subjects. To further confirm this, we exposed all the subjects to existing phishing education after they had taken the test, after which each subject was asked to take a second phishing test, with the same design as the first one, but with different questions. The number of stimuli that were indicated as being phishing in the second test was, again, independent of the actual number of phishing stimuli in the test. However, a substantially larger portion of stimuli was indicated as being phishing in the second test, suggesting that the only measurable effect of the phishing education (from the point of view of the phishing IQ test) was an increased concern-not an increased ability.
Social bookmarks allow Web users to actively annotate individual Web resources. Researchers are exploring the use of these annotations to create implicit links between online resources. We define an implicit link as a relationship between two online resources established by the Web community. An individual may create or reinforce a relationship between two resources by applying a common tag or organizing them in a common folder. This has led to the exploration of techniques for building networks of resources, categories, and people using the social annotations. In order for these techniques to move from the lab to the real world, efficient building and maintenance of these potentially large networks remains a major obstacle. Methods for assembling and indexing these large networks will allow researchers to run more rigorous assessments of their proposed techniques. Toward this goal we explore an approach from the sparse matrix literature and apply it to our system, GiveALink.org. We also investigate distributing the assembly, allowing us to grow the network with the body of resources, annotations, and users. Dividing the network is effective for assembling a global network where the implicit links are dependent on global properties. Additionally, we explore alternative implicit link measures that remove global dependencies and thus allow for the global network to be assembled incrementally, as each participant makes independent contributions. Finally we evaluate three scalable similarity measures, two of which require a revision of the data model underlying our social annotations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.