The index is crucial for information retrieval efficiency. Different with text data, tagged data contained rich semantics, which is useful to promote the quality of search results. It is observed that most existing indexes for keyword search do not consider semantics of tags. After an analysis of tagged data, we proposed the concept of result entity basing on the theory of relational database. We present a formula to quantify semantics of tags and then introduce a novel semantic index for keyword search. Experimental results demonstrated that our approach can help to reduce the size of the keyword inverted list in tagged document dramatically and improve the retrieval quality.
Keyword search is still the most effective means for users to obtain information. In the era of massive data, more and more structured and semi-structured data can be directly accessed by users. Different from the web data and text data, structured data and unstructured data contain a lot of semantic information, which can improve the accuracy of search results. Based on the theory of information entropy, we calculates the information entropy of semantic information to determine the semantic relevance of keywords, and then uses the degree of relevance between keywords and semantics, finally constructs a semantical model based on information entropy. Final experiments prove the effectiveness of our model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.