Discovering Knowledge Structure in the Web by Siddharth Ramu In this project, we implement a new concept to extract knowledge or the semantic meaning from the Internet data. We apply the Association Rule and the apriori principle to a chosen set of documents from the Internet and analyze the associated benefits and drawbacks with this new concept. First, we find the highly relevant keywords in all documents by using the tf-idf. From these keywords, we find all keyword pairs (finite sequences of length 2), that are within a distance of 30 words. We then take keyword pairs with high frequency and find keyword triplets (finite sequence of length 3) … and so on, until there are no more high frequency finite keyword sequences. At each stage, we apply the Association Rule to find the primitive keywords that do not grow any longer. This is known as the Primitive Concept. We find the next set of keywords that can be associated from the non-primitive keywords. In this way, we find the longest keyword sequence that exists. We observe that this finite frequency of keywords often represents a concept in the web. For example, "closeness centrality" keyword pair represents social networking. Another generic example v could be "Wall Street". This keyword pair represents the New York stock exchange. In this project, we apply the Association Rule and the apriori principle on a set 21 IEEE papers on social networking. We limit our research to finding the primitive concept for each document and analyze the results obtained.