Summary
Representing semantic web information in the form of existing formats like extensible mark‐up language (XML), resource description framework (RDF) or web ontology language (OWL) is limited by the inability to maintain intradocument relationships between different semantic entities. Moreover, these standards are built for generic data representation and thus do not have inherent support for the semantic description of the document and, consequently, do not support features like multidocument linkages and context‐sensitivity. In order to remove these drawbacks, a semantic web‐specific document representation approach, namely, deep semantic XML (DS‐XML) using deep linking and machine learning, is presented in this paper. The DS‐XML approach facilitates the use of semantics with the help of a novel document structure that focuses on defining any web document in a semantic form. It includes ontologies and linkage between different vocabularies, which assists in achieving semantic interoperability. In order to make this approach future proof, a novel deep linking approach is integrated at the document level, which allows DS‐XML to understand more about the contents of the document and represent information in an aggregated manner. The deep linking layer is combined with the machine learning‐based classification layer that allows the document information to be represented in an interactive manner. The proposed DS‐XML approach is tested on different document retrieval applications, including a few Internet of Things (IoT) applications, and the representation performance is found to be superior to the existing semantic approach (XML, RDF and OWL).