Jump to content

Research:Topical coverage of Edit Wars: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
No edit summary
No edit summary
Line 86: Line 86:
** Define a distance metric for topics (eg: Geography is N steps far from Politics, and M steps far from Sports, is N > M or not)
** Define a distance metric for topics (eg: Geography is N steps far from Politics, and M steps far from Sports, is N > M or not)
* Apply an outlier detection mechanism to find potential cases of wikihounding.
* Apply an outlier detection mechanism to find potential cases of wikihounding.

=== NLP Approach ===
TODO

== Timeline ==
Q1, Q2


== References ==
== References ==

Revision as of 18:23, 29 September 2017

Tracked in Phabricator:
Task T171249
Created
23:06, 16 September 2017 (UTC)
Duration:  2017-September – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


Wikihounding is an type of harassment that can be described as stalking behavior that spans different topics and namespaces, has a component of bullying behavior, is not about the topic, but about people: someone following another contributor irrespective of topic.

This project aims to characterize and model this kind of behavior, given tools to the community to understand better and deal with this problem.


Methods

We are going to study user interactions, analyzing massive datasets, using techniques such as:

More specifically, we propose to follow two lines of research one focus on text analysis, using NLP techniques, and a content agonist approach, analyzing user interactions in hypergraphmodel.

Content Agnostic Approach

Edit wars in Wikipedia has been largely studied. An edit war is usually consider to be the consequence of different opinions about an specific topic, between two or more users. Naturally, users with different political views might have different opinions on many articles related with politics, and these differences can scale in a multi-article edit war. These actions can be consider toxic, but is not necessarily a stalking behavior. However, in the case (if exist) that edit wars start happening across multiple topics, this can be an indicator of a person-centered attack (instead of topic-centered), that might be categorized as wikihounding.

Taking the advantage that edit wars can be detected in content agnostic approach (without analyzing the text), we propose to study the topical span of those wars, characterizing usual and unusual (potentially toxic) behaviors.

The main tasks to develop such model are:

  • Generate a representative dataset of edit war in Wikipedia.
  • Detect pairs or groups of users involved in more than X (define X part of the study) controversies.
  • Define and implement a robust topic model for articles, suitable for this study.
    • Define a distance metric for topics (eg: Geography is N steps far from Politics, and M steps far from Sports, is N > M or not)
  • Apply an outlier detection mechanism to find potential cases of wikihounding.

NLP Approach

TODO

Timeline

Q1, Q2

References