How Twitter can improve the health of its platform and foster civil conversations

- 11 mins

July 11, 2021 was a special day for football fans. It was the Euro Final between England and Italy, and while Italy has had recent success on the international front, England hadn’t won any major world trophy in over 50 years.

With the score 1-1 at 90 minutes, the game went into extra time, and then to penalties. What followed was misery for England. Three young players, Rashford (age 23), Sancho (age 21), and Saka (age 19), all missed their penalties and England lost the final.

No one wanted to win this final as much as these young players. While the bulk of disappointed football fans dealt with their disappointment with a gripe and a pint, some got angry online. Rashford, Sancho, and Saka have all since been subjected to horrendous online abuse across many social platforms.

All three of them took to Twitter to post their reflections post-game. Here are their reflections — Rashford, Sancho, and Saka.

This line from Saka’s message struck me in particular -

“I knew instantly the kind of hate that I was about to receive and that is a sad reality that your powerful platforms are not doing enough to stop these messages.”

As a Twitter employee, I often reflect on how we can make the platform more civil and constructive. Not just for VITs (Very Important Tweeters), but for everyone.

My goal here is to share potential product strategies based on the research I’ve come across on maintaining civil conversations on the web. I also explore how behavioral psychology can make Twitter conversations civil.

Twitter’s goal

Jack Dorsey, the CEO of Twitter, posted this thread back in 2018 to share Twitter’s health initiative. The whole thread is worth a read, but here are my favorite bits —

Twitter’s goal is to increase the collective health, openness, and civility of public conversation.

We have witnessed abuse, harassment, troll armies, manipulation through bots and human-coordination, misinformation campaigns, and increasingly divisive echo chambers. We aren’t proud of how people have taken advantage of our service, or our inability to address it fast enough.

We’ve focused most of our efforts on removing content against our terms, instead of building a systemic framework to help encourage more healthy debate, conversations, and critical thinking. This is the approach we now need.

While there’s large progress made to improve the health of the platform, there are still some key open problems.

The problem

Below, I’ve identified three groups of Twitter users involved in unhealthy conversation, and highlighted some of their key problems-

Entity receiving negative tweets

These users are the subject of harmful tweets.

Entity tweeting negative tweets

These users are the main culprit of initiating unhealthy conversations


The observers are the rest of the users that are seeing the harmful conversation taking place on Twitter.


In this section, I’ll highlight key hypothesis in solving the problems stated above by each group of user -

Entity receiving negative tweets

1) When these users are overwhelmed with negative feedback, providing them additional options to feel safe on the platform will not deteriorate their experience of using Twitter.

2) When these users are overwhelmed with negative feedback, providing them sections of tweets separated by mood will help them continue to be a part of the public conversation.

3) When these users are overwhelmed with negative feedback, providing them with an option to elect moderators for their replies will help users immediately ignore all the negative feedback.

Entity tweeting negative tweets

1) When these users are attempting to post a negative/harmful tweet, bridging the empathy gap with their subjects will decrease the likelihood of them posting it.

2) When these users are composing a negative/harmful tweet, making them accountable for their nonconstructive conversations will decrease the likelihood of them repeating the behavior.


1) When these users view an account subjected with harmful tweets, providing them with an incentive will make it more likely for them to take an action.

Proposed Solutions

Safety Mode and Mood based Tweets

I started this article with describing how overwhelming Twitter can be for users. There are certain events that are beyond our control. Visiting Twitter in one of emotionally triggered events can be an unpleasant experience. Harmful mentions, replies, quote tweets can all be thrown at you when the time is rough.

Twitter’s product team has made progress here. With the latest addition that allows changing your conversation settings after you’ve posted your tweet.

However, to truly make it a safe experience for users that are being subjected to harmful tweets, we need to give them 1) easier access to their safety controls, and 2) universal safety across their entire Twitter experience.

Enter ‘Safety Mode’.

Safety mode will provide users with an option to instantly turn off mentions, replies, quote tweets to their account for a set time. Further, controls can allow these users to 1) only allow mutual followers (or a certain list of members) to bypass this, and 2) set the time duration to turn off safety filter after X hours, days, etc.

Another feature to explore would be classifying tweets by mood. By turning this on as a preference in safety mode — all replies, mentions, and quote tweets will be classified into either ‘positive’, ‘negative’, and/or ‘controversial’. Giving more control to the users receiving harmful tweets.

Scenario of Safety Mode -

Marcus Rashford has missed a penalty in the final of the Euros. The pressure was immense, and he knows that England fans are upset.

He still wants to connect with his true fans to share his feelings with them, but knows how rough his Twitter engagement can be. He decides to turn on safety mode for the next coming days. When visiting Twitter, he is not subjected to harmful comments anywhere, and he can choose to tweet and see the replies by his mood.

Gamification of Twitter’s Health

This solution focuses on solving two problems —

1) There is no incentive for the observers to report tweets

2) There is lack of accountability for harmful tweeters and a lack of incentive for them to be civil.

Solution - Twitter Profile Badges and Twitter Rewards

A research study designed to understand civil conversations from Reddit’s ChangeMyView subreddit shows that gamification can help in encouraging users to be civil to one another. When a Reddit user is able to change the view of the original post, then the user is rewarded. The award is a prestigious item for the users, and encourages them to reply in a civil, constructive manner.

In comparison with Twitter, there are is no incentive for the observers to report harmful tweets. However, a gamification concept can encourage observers to get involved, and harmful tweeters to reconsider their approach to Twitter.

Twitter Rewards and Badge Scenario

An observer comes across a harmful tweet and reports it. The tweet is indeed deemed harmful by Twitter’s infrastructure. This earns the observer a Twitter reward point. Collecting enough reward points can earn the Twitter observer a ‘Helpful Tweeter’ badge. Once the observer has collected enough reward points/badges, they can redeem it for a month of free subscription to Twitter Blue.

Note — Reward values could either be constant or variable depending on the ‘importance of the report’ and other such factors.

This solution also helps Twitter understand which reports are to be taken more seriously. Reports from observers with a ‘helpful tweeter’ reputation can take priority over others for swift action in removal.

Challenges — There are a lot of ways through which bad actors can find a way around this feature. An account/bot could build a reputation of ‘Helpful Tweeter’, and might want to sell that account on the market because of the reputation. However, by limiting the power of these users to only get faster attention from the abuse control team, we can potentially offset this issue. Twitter teams can later also decide to decentralize community badges if they decide to explore this path. Which brings me to…

Decentralized moderation

Twitter recently announced a feature that I was very excited about — up votes/down votes on replies.

This is a step in the direction of decentralizing content moderation. Users get to decide if a content appeals to them and how it should rank. A similar approach could be applied to moderation of harmful content.

The problem of feeling overwhelmed with a barge of negative tweets is serious. Moreover, the current systems are designed in a way that makes you feel alone in your battle against online hate. When subjected to hate, the targeted user may want support from users they trust.


Decentralize content moderation by classifying content that is deemed harmful by either 1) elected accounts, or 2) trustworthy Twitter account lists.

As these elected moderators come across content that is harmful to their subject, they mark it as harmful. Those tweets are then automatically classified in a separate section for the targeted user. At a later time, they can choose to see the content if they want to.

This helps with making sure that harmful content is immediately classified into a separate section by the user’s elected/close contacts.

Bridging the Empathy gap

Twitter recently added a feature that prompts users to re-consider posting a harmful tweet. The results are encouraging — ‘If prompted, 34% of people revised their initial reply or decided to not send their reply at all.’

It’s also important to understand the state of the user composing the harmful tweets. It can be said with confidence that they are in a highly charged emotional state. Their intention is to hurt. And, they are rewarded when they achieve that outcome in some manner on Twitter. It’s a negative cycle that repeats itself. Prompting and then allowing the user to send the tweet anyway in their emotional charged state may not be as effective.

Solution — Allow emotional triggers to be surfed out

If the tweet composed and submitted by the user is above a certain threshold of ‘harmfulness’, Twitter enforces a 1-3 hour wait period. This wait period doesn’t allow the user to send any new tweets.

Nir Iyal describes what some behavioral psychologists call “surfing the urge.” Emotional triggers are like waves in that they rise in intensity, peak, and eventually crash. Given this wait period, it is very likely that the emotional attachment the user has with the subject reduces.

An area to explore would be bypassing this time limit by bridging the empathy gap. Forcing the harmful tweeter to watch a mandatory empathy training/video to bypass the limit could be one way.

There are challenges. Empathy training is not easy. It may also not be effective for a large chunk of Twitter users, however this is why we have experiments. Experimenting with a very high confidence of abusive tweet might be a way to find out.

Good steady progress

Twitter continues to make great progress in this area, and my goal with this article is to show some other product strategies Twitter could adopt. Research continues to point out that maintaining civility on the internet is a tough problem to solve.

Safety mode, and decentralized moderation, are two ways Twitter can circumvent one party misbehaving by protecting the targeted users. Whereas, gamification by providing incentives, and bridging the empathy gap are two product strategies that can change the behavior of harmful tweeters.

I wrote a similar article on expanding Twitter’s product strategy in my previous article. If you have any ideas, thoughts to the above, please either DM me on Twitter and/or share your comments below 👇.

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora