Moderating Discussions on Social Media in Response to Hate and Disinformation

By answering these questions, we are clarifying our position on the topic of moderating discussions in the online space. Moderation is one way of responding to problematic content, but reactions to this kind of intervention are contradictory.

Is it necessary to do this? What does the data say?

Discussions on social media are often full of problematic content such as hateful and violent speech or disinformation. The #hatefree initiative, founded by the technology company TrollWall and the PR agency Seesame, conducted two analyses of the prevalence of hate speech on Facebook in Slovakia in 2023. In the first round, they looked at more than half a million comments on 428 pages. In the second, they looked at nearly 7 million comments on 492 profiles over a threemonth period. The observed trends were only confirmed.

According to the first report, which monitored Slovak Facebook between 1 and 15 April 2023, 15.47% of all comments had hate speech features.
The second report, which monitored the period August-October 2023, found that 12.9% of all monitored comments were vulgar or hateful.

Profiles that chose to set rules for cultivated discussion, monitor the situation and moderate comments had a share of problematic comments at only 1.9% during the three-month analysis period.

Picture source: ChatGPT

Why is this a problem?

Slovakia is facing great polarization, which is a breeding ground for hate speech. The online environment (thanks to characteristics like anonymity) lowers the barrier for hate speech and the presence of such content creates a toxic online environment. By default, people who experience strong emotions, such as being angry, comment the most. Therefore, many comments in discussions are negative. However, this does not faithfully represent the opinion of society. If hate speech is prevalent in the online space, people do not feel safe and inclusive there.

Users spend a lot of time on social networks. Whether they want to or not, the level of discussion they experience online influences what they consider normal and acceptable (even) in the real world. The moment we take care of our social networks and make it a safe space for discussion, we also open up space for those who don't normally engage.

"Studies show that if space is left to hate, groups that are already less engaged in social dialogue, which in Slovakia means mostly women and various minorities (sexual, racial, other), will not engage in discussion because they don't perceive the space as safe." Tomáš Halász (CEO TrollWall) [1].

Why is social media management not used everywhere?

Many actors do not pay enough attention to managing the comments under their posts. There are several reasons for this and they often vary from case to case, but some of the most common include:

Low priority. Some people don't give enough importance to the cultivated discussion on their own profile. They consider the quality of published posts to be their responsibility. On the other hand, they see uncultured discussion under posts as a failure of the individuals who wrote the comments, not theirs.
Concerns about backlash. Fear of a negative response to this kind of intervention is also a common reason. Whenever profiles delete comments or block users based on rule violations, there are always some dissenting reactions calling for freedom of speech and an end to censorship.
Lack of capacity (human, financial, time).

On the personal profiles of ordinary people, the interaction rate is mostly low. But we should not forget about the profiles of famous people, politicians, companies, institutions, media, non-profit organizations, and so on. If there are many interactions, capacity is also an issue. Especially smaller actors often do not have a specific full-time job dedicated to managing social networks. It is one of many tasks within someone's full-time job, and the amount of time one devotes to it within one's working hours corresponds to this.

Picture source: ChatGPT

What do we mean by managing the discussion?

Managing or moderating discussions is a complex topic, and community managers are not only responsible for monitoring problematic content. Their job also includes responding to positive comments, constructive disagreement or criticism, answering questions, and so on. However, in what follows, we will only focus on what are the options for responding to problematic comments (hateful, offensive, disinformation).

The basic ones include responding, hiding, and deleting.

People who don't want to give away any problematic content from social media (e.g., those who advocate for absolute freedom of speech) always have the option of responding to a written comment in the discussion, whether in terms of form (pointing out inappropriate tone/vocabulary, appealing for constructive responses and civil discussion) or content (responding to the point of a written comment, providing a factual response to an attack/challenge/question). To some, this may seem like a futile attempt to have a rational discussion with an irrational user. Three possible outcomes.
1. 1. Not everyone who expresses themselves in a vulgar or hateful manner is necessarily irrational or radical.
  1. 2. People can change their minds, and the chances are certainly higher when we debate with them than when we give up on them.
1. 3. You're not writing the response to a specific person. Many others will read it as well. Only a fraction of users create content on social networks. The larger part (but still a minority) comments and discusses there. And then, there is the silent majority who consume the content but don't feel the need to engage themselves. You are writing your response to all of them.

If there is content you don't want to be visible to casual readers, you have the possibility to hide it. These are situations that for sure many have experienced themselves. Below the post, we see that there are 28 comments in the discussion, but when we click there and start reading, suddenly we see only 15. The hidden comments remain visible only to the people who wrote them and to their friends.
A more radical step than hiding is deleting the comment. Such a comment completely disappears from the social network.

If someone evaluates a comment as problematic, they also have the option to report it. If it’s a profile that repeatedly violates the rules or there are clear indications that the profile is fake, there is the option to block its access to the page and report the profile. Some of the steps mentioned above should ideally be intertwined - for example, if I think a comment is so problematic that I want to remove it from the discussion, I should report it straight away. If a user is overstepping the bounds to the point that I decide to block his/her access to my account, it would also be good to report such a person.

How can this be done and what are the specifics of different approaches?

Whatever moderation approach one chooses, it is good to communicate it transparently. It is recommended to write down a set of rules for discussion on a given profile (so it is clear what is allowed and what is out of bounds), make it publicly available and then consistently enforce it (for example, if hate speech is against the rules, it does not matter whether someone writes that it would be appropriate to assassinate a person I like or an individual who I think is committing horrible acts and I disagree with - the comment should be treated equally).

One way to ensure compliance with established rules is to use human moderation. In this case, the discussions are manually navigated by a human. This can be done by an individual or by a whole team of community managers. Sometimes this function is performed by internal team members, while other times external forces are hired. But once we talk about profiles with larger amount of followers, we quickly run into capacity limits. One can only manage to go through a certain number of comments. In addition, coming into contact with unpleasant content is mentally demanding and can contribute to (already high) employee turnover. Each person needs to be repeatedly trained, which is both time-consuming and costly. This type of moderation is dependent on the human element, which is of varying quality. It is impossible to avoid personal biases, we can only limit them.

Another option is the involvement of artificial intelligence (AI). Every tech company works a little differently, but we can explain at least some of the behind-the-scenes by using the example of a Slovak startup that is dedicated to moderating social networks with the help of AI. TrollWall, which celebrated its first birthday in March 2024, currently operates in seven countries and five languages. The tool automatically filters toxic comments and links to disinformation sites not only in Slovak but also in Czech, Polish, Romanian, and German. In a year of operation, it has checked more than 7,000,000 comments, hidden more than 1,800,000 problematic reactions and protected more than 200 profiles. TrollWall monitors Facebook, Instagram and YouTube. Other plans include adding new languages, collaborating with other countries, or expanding to TikTok.

AI is developed and taught on a language-by-language basis, as swearing, hate or disinformation is very language-specific. They use native speakers as annotators for hate and profanity. For disinformation, they rely on external data suppliers, which are local organizations with their own methodologies (e.g. konspiratori.sk in Slovakia or nelez.cz in the Czech Republic). With each additional language, their process becomes more efficient. "We worked on Slovak for maybe 9 months, now we can process a new language in less than 2 months," said the co-founder of the company [2].

Picture source: ChatGPT

Content that the AI evaluates as problematic is hidden. In addition to regular updates, the model is also improved through a back-check of how it has evaluated individual comments, which is done by three independent humans. Moreover, the final decision is always in the hands of the profile administrators and moderators, who can keep the content pulled by the AI hidden, delete it or put it back into the official discussion. The main benefits of AI involvement in social network moderation include:

Relieving staff. Managing discussions on social networks is not only time-consuming but also mentally demanding. Once profanity, hate speech and disinformation are filtered automatically, the people managing the social networks are left with some time to build a relationship with the community (answering questions, responding to constructive comments, creating additional content, etc.).

24/7 operation. Few actors in the region monitor their social networks around the clock. AI tools, however, can do this. "The data shows that although the total number of commenters quantitatively decreases at night, the percentage of hate posts increases. In an analysis of Slovak Facebook, the hate rate between one and four in the morning was 20%, while during the working day, it was 13%" [3]. Furthermore, moderating discussions is not a job one can do 8 hours a day. In practice, it has been shown that after just 4 hours the accuracy or number of comments processed dropped significantly [4].

Immediate response. Another difference from human moderation is the speed of response. The AI hides problematic content within seconds - so these links don't get a chance to be seen at all. "We know from the psychology of how social networks work that instantaneousness is very important. The first hateful comment starts a spiral of negativity and toxicity, which is then picked up by either other haters or the people who are affected by it." [5].
Consistency. Each of the human moderators brings their own perspective. Even with detailed methodologies and a good training system, it is not possible to achieve the same level of consistency in content evaluation that a machine can have.
Scalability. Unlike external human moderation services, which have limited capacity, if there were a sudden need for multiple actors to implement AI moderation (e.g. multiple media will face attacks), artificial intelligence tools are technologically prepared for such a scenario and can meet the demand.

The last form we want to mention is the hybrid model, which uses a combination of AI and human moderation. An example of a Slovak company working with this method is elv.ai. In June 2023, still under the name Elfovia.sk, the project was spun off from the New School Communications agency as an independent startup elv.ai. It currently moderates discussions in Slovakia and the Czech Republic. Specifically, it helps 35 Czech and 68 Slovak profiles. During their tenure, they have checked over 19,000,000 comments, of which they have hidden over 3,000,000 because they contained profanity, hate speech, disinformation or spam. They also identified more than 5.000 fake profiles.

"Right now, our main focus is on adapting and training our AI models to different languages and integrating other platforms, including YouTube. Soon we want to focus on more countries in Central and Eastern Europe, but we will also continue to operate in Slovakia and the Czech Republic. Slovaks and Czechs are struggling with hoaxes and disinformation. This is where we perceive the greatest need and purpose of our services," said the director [6].

Picture source: ChatGPT

To give you a better idea - hybrid moderation in the context of elv.ai means that they need one full-time person for 160,000 comments per month that require human moderation.

If we were to compare the hybrid model with the model of moderation purely through AI:

The advantage is a more accurate solution for the grey zone that the model is not yet sufficiently trained for. From elv.ai's experience, under the profiles they manage, the toxicity level can be as high as 25% without moderation. Within this 25%, hate speech (≈80%) represents the majority, followed by disinformation (≈8-10%), and the rest (≈10-12%) falls into the so-called grey zone. Their model automatically hides links to disinformation sites and disinformation that has already been debunked by their partner organizations, as well as hate speech that the AI is 95% or more certain of. The rest requires human moderation - the decision is made by a human and still gets someone from the senior team to approve/disapprove it due to double checking. It's also one way to catch false positives (hide something that shouldn't have been hidden) or false negatives (don't hide something that should have been hidden) cases that occur with every model [7]. However, it should be added that models that operate without paid human moderators also address errors, albeit through different means. TrollWall, for example, has set up multiple automatic systems to catch potentially erroneous detections, which involves a combination of user feedback, random checks, checking decisions where the AI had lower confidence, as well as systematically capturing words and phrases that were not previously present in the datasets. All of this data is annotated independently by at least three people on a weekly basis, making the AI model adaptive. However, they do this retrospectively (the model has somehow determined the content - a human looks at it and perhaps changes the [already made] decision), not pre-emptively.

Another specific feature is the ability to react promptly to emergent situations for which the model is not trained. Problems arising in real time, which fall through the cracks via a model learned on a dataset that does not contain new things, can be captured by humans. At the same time, this immediately creates a test dataset on which the model can be quickly trained [8]. The risk of this approach is that the decision is made by a small number of people and under time pressure.

The disadvantage may be higher costs. Human moderation requires finding, training and employing community managers. Collaboration with people proficient in each of the languages involved is equally necessary over the longer term and in greater volume than just training AI on a new language and periodic updates. There also remains the downside of harmful impacts on the people doing the work. Constantly coming into contact with toxic content takes a toll on the human psyche. Compared to pure AI moderation, there is also a lack of instantaneousness of response - human moderators can guarantee a response in the order of minutes/hours.

Compared to pure human moderation, some of the advantages mentioned above (relieving people, consistency, ability to handle a larger volume of work) apply.

How to choose?

The choice of the appropriate social network moderation method depends on many factors. For example, the volume of work, the nature of the content, the need for speed of intervention, the cost, the legal and ethical aspects, etc. However, it does not matter which form of moderation an actor chooses. It is important for individual profiles on social media to be aware of their social responsibility; and that by cultivating discussion on our own networks we can together create an environment that is less toxic and polarized, where there is less disinformation, where people do not hesitate to express their constructive observations, and where all this promotes decent discussion as a standard, not a rarity or an unattainable ideal.

SOURCES:

[1] Interview with Tomáš Halász (CEO TrollWall) from 4.3.2024.

[2] Interview with Tomáš Halász (CEO TrollWall) from 4.3.2024.

[3] Interview with Tomáš Halász (CEO TrollWall) from 4.3.2024.

[4] Interview with Jakub Šuster (CEO elv.ai) from 3.4.2024.

[5] Interview with Tomáš Halász (CEO TrollWall) from 4.3.2024.

[6] Interview with Jakub Šuster (CEO elv.ai) from 3.4.2024.

[7] Interview with Jakub Šuster (CEO elv.ai) from 3.4.2024.

[8] Interview with Jakub Šuster (CEO elv.ai) from 3.4.2024.

Author: Ivana Ivanová

Illustrations: ChatGPT

This piece was published in partnership with PDCS - Partners for Democratic Change Slovakia.