Wikipedia vandalism detection

Mola-Velasco, Santiago M.

doi:10.1145/1963192.1963349

Cited by 26 publications

(9 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[17][18][19][20]). As mentioned by van den Berg et al [14], it could be useful to apply similar approaches, methodologies and technologies, which have already been utilized in other open source projects and Web 2.0 encyclopedias, to detect vandalism in OSM and/or revert unconstructive changes.…”

Section: Introductionmentioning

confidence: 99%

Towards Automatic Vandalism Detection in OpenStreetMap

Neis

Goetz

Zipf

2012

IJGI

View full text Add to dashboard Cite

Abstract:The OpenStreetMap (OSM) project, a well-known source of freely available worldwide geodata collected by volunteers, has experienced a consistent increase in popularity in recent years. One of the main caveats that is closely related to this popularity increase is different types of vandalism that occur in the projects database. Since the applicability and reliability of crowd-sourced geodata, as well as the success of the whole community, are heavily affected by such cases of vandalism, it is essential to counteract those occurrences. The question, however, is: How can the OSM project protect itself against data vandalism? To be able to give a sophisticated answer to this question, different cases of vandalism in the OSM project have been analyzed in detail. Furthermore, the current OSM database and its contributions have been investigated by applying a variety of tests based on other Web 2.0 vandalism detection tools. The results gathered from these prior steps were used to develop a rule-based system for the automated detection of vandalism in OSM. The developed prototype provides useful information about the vandalism types and their impact on the OSM project data.

show abstract

Section: Introductionmentioning

confidence: 99%

Towards Automatic Vandalism Detection in OpenStreetMap

Neis

Goetz

Zipf

2012

IJGI

View full text Add to dashboard Cite

show abstract

“…• RQ4: The most used method has been the use of classifiers, in machine learning processes, for the detection of acts of vandalism, against a previously established corpus [17]. The analysis that the researchers carry out is the same as outlined by Adler et al [18], and relates to one of the four basic computational approaches: language characteristics, textual content characteristics, metadata relating to publications and the reputation of editors.…”

Section: Discussionmentioning

confidence: 99%

Research on Wikipedia Vandalism

Saz

Garrido

Sánchez-Casabón

2016

Proceedings of the 4th Spanish Conference on Information Retrieval

View full text Add to dashboard Cite

Research on vandalism in Wikipedia has been of interest for the last decade. This paper performs a literature review on the subject, with the goal of identifying the main research topics and approaches, methods and techniques used. 67 papers have been reviewed. Main topic is the detection of vandalism, although there is a increasing interest about content quality. The most commonly used technique is machine learning, based on feature analysis. It draws attention to the lack of research on information behavior of vandals.

show abstract

“…The best approaches of that competition were based on timing analysis of revisions [20], language features [16], and user reputation [1]; the three approaches were then unified in [3].…”

Section: Related Workmentioning

confidence: 99%

“…An estimate of contribution quality can be used to flag some contributions for review, as well as for producing initial rankings of new content. For the Wikipedia, there has been a large body of work on automated methods for detecting vandalism and flagging revisions for review [17,18,20,5,16,14]. These methods generally rely on a mix of machine learning and natural language processing; a yearly competition (PAN) compares the performance of such detection methods.…”

Section: Introductionmentioning

confidence: 99%

Predicting the quality of user contributions via LSTMs

Agrawal

deAlfaro

2016

Proceedings of the 12th International Symposium on Open Collaboration

View full text Add to dashboard Cite

In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user's history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features.We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input.In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variablelength history of a user's activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminate the process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user's past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution.We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user's past work. This can be explained by noting that the primary function of user reputation is to provide an incentive toPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. wards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality. We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions, and decreasing when the user performs a sequence of low-quali...

show abstract

Wikipedia vandalism detection

Cited by 26 publications

References 16 publications

Towards Automatic Vandalism Detection in OpenStreetMap

Towards Automatic Vandalism Detection in OpenStreetMap

Research on Wikipedia Vandalism

Predicting the quality of user contributions via LSTMs

Contact Info

Product

Resources

About