Information tag, empirical metadata, developer activity
CONTEXTEmpirical metadata that describe activities of developers with source code are often stored as logs with only basic references to source code. These approaches deals with several problems. At first, logs are attached to whole files and it is hard to analyse the collected data. The second problem is dynamism of source code. If we have only logs about activities of developers over source code, we could not be sure that the source code still exists or it has been at changed.
GOALIf we transform logs to empirical metadata that are directly assigned to software artefacts and we can guarantee validity of these metadata, then we can assign meaning to empirical metadata, define empirical software metrics and analyse characteristics of the source code in a particular time (a project state). This idea can be used as a solution of problem of usability of empirical software metrics [1].
METHODTo address this idea we store empirical metadata in form of information tags created by a tagger from the stream of developers' activities. We also proposed robust location descriptor that exactly identifies position in the source code [2]. In this work we describe the approach to real-time processing developers' activities and enriching source code with empirical metadata calculated from these activities.We collect data via extensions for integrated development environments and web browsers, that collect developers' activities. Activities are processed on the server as a stream of events and calculated empirical metadata are stored to information tags repository. So we store only collected data from time window which is necessary for matching event sequence templates that identify significant activities of developers that should be assigned to code as information tags.
RESULTSWe realized proposed approach in the tagger based on processing linked stream data [3], which gives us possibility to utilize advantages of linked data inference and event sequence templates in a SPARQL-like language. The tagger consists of repository of tagging rules, linked stream data generator and processor and tagging actions executor.Tagger creates information tags based on tagging rules that have specified stream queries and tagging actions, that are stored in repository of tagging rules. Stream queries match sequences of events in the stream and selecting queried data. Following of a positive evaluation of queries, tagging actions uses queried data and create information tags.
CONCLUSIONSWe performed preliminary performance evaluations. We have registered query which selects objects that consist of five RDF triples. During the evaluation we repeatedly posted 100,000 objects to the tagger and we measured response time of the tagger. This evaluations proves that the tagger can process a stream of developers' activities in real-time.
ACKNOWLEDGMENTS
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.