Building a Bot to counter Negative Comments on Reddit using Natural Language Processing

Originally posted to Medium on 06/06/2018.

Reddit is an amazing website for sharing just about everything and has the tagline as ‘the frontpage of the internet’. It has even recently overtaken Facebook to become the US’s 3rd most popular site. Unique to Reddit, compared to other social media sites, is the ability for users to post anonymously. This anonymity is part of what makes Reddit so popular but as with anything good it can also have some downsides.

The ability to post with little to no consequences means that in almost any submission there will be those that respond with negative comments. Although these are often constructive criticisms, a small amount may be sweeping, nasty and are overall not constructive towards to the discussion. Fortunately, these are often quickly highlighted by other users and downvoted and, if unpopular enough, may be hidden. However, those that make original submissions to smaller sub groups (groups are known as subreddits on Reddit) are likely to be impacted by these negative comments. This is based on my own experience after posting to some small subreddits and receiving comments such as the following:

It can take a lot of effort and confidence to post content about yourself, particularly pictures, and comments like these make it hard to want to continue contributing. It is unfortunate that our minds will naturally focus on the negative instead of the positive comments.

Whenever this occurs, my first question is always ‘Who are you to judge?’. Are you someone who has contributed to this subreddit and has earnt the right to be dismissive or are you just passing through and simply get a kick out of being mean towards someone else?

However, I am not so thin skinned to believe that negative comments, however they are written, are inherently bad. Any person submitting to a subreddit needs to understand who they are posting to and whether it is relevant and taking constructive criticism is just part of the process. For example, after posting a collage of images from my childhood outfits I received the following response. Now, whether this is just an individual opinion could be argued but I accepted that perhaps my submission was perhaps not appropriate for this subreddit and learnt from this going forward.

The Aim

I wanted to create a Python bot that responded to the judgement of a user when it was to be used and, when called, could discredit cases where the negative comment was unfairly spiteful.

Based on observational logic from being on Reddit, if someone made a negative comment that I felt was not constructive I would like to answer the following:

1. How old is the user’s account? (i.e. Are they hiding behind a fresh account with no consequences for being downvoted?)

2. How many times has this user submitted their own posts to the subreddit?

3. What proportion of the user’s overall comments are negative?

4. What proportion of the user’s comments to this subreddit are negative?

If I can establish these then I hope to either classify the user into either being someone who seems to continuously negative on the site or, perhaps this is unusual, and they are just having a bad day.

Setting up the Bot

To build the bot, I followed the guide written by pythonforengineers.com that uses the Python package Praw to pull information from Reddit’s API.

Following this guide, the first step is to enable to bot to find comments that are negative. For the reasons mentioned earlier, the bot will be called specifically by users and when a specific text phrase is used. To do this, I have named the bot ‘FloBot’ and if a user leaves a comment with this in then the bot will find the comment and start analysing the parent comment. A test example of this is shown below:

For testing, I have created the subreddit /r/FloBotReview where a test post is shown that demonstrates the responses given by the bot. The bot will go through each comment made to this subreddit and if any of the comments contain ‘FloBot’ it will pick this up and then look at the comment that this is responding to:

for comment_tracker in subreddit.stream.comments():
    if re.search("FloBot", comment_tracker.body, re.IGNORECASE):
        comment = comment_tracker.parent()

We can then go through and find out some basic information (as shown in the picture above) about the comment such as the author, the text (in case it is removed later) and the current score.

print("Subreddit: ", comment.subreddit)
print("Author: ", comment.author)
print("Text: '", comment.body,"'")
print("Score: ", comment.score)

Natural Language Processing (NLP)

A bit part of this process is using some natural language processing to establish whether the comments written by the user being questioned are negative or positive. For this, I have used the Python package TextBlob that for now allows us to very easily analyse the quality of the text. We use this package to determine the subjectivity and polarity of the comment as is again shown in the example above. Polarity is what we will use to determine the negativity of the user’s historic comments and is used more extensively later.

print("Sentiment Analysis Subjectivity: ", np.round(TextBlob(comment.body).sentiment.subjectivity,4))
print("Sentiment Analysis Polarity: ", np.round(TextBlob(comment.body).sentiment.polarity,4))

Analysing User’s History Posts and Comments

We now have the basis for completing our analysis, we have connected with Reddit’s API to:

– Find the comments we want to counter

– Collect basic information about the user

– Apply some NLP to assess the negativity of the comment text.

Because we have collected the name of the negative comment author, we can now go through and collect all their comments and, in a for loop, calculate the polarity of each. Once we have this for each comment, we collect this as a Pandas data table with the following information:

With this table alone, we can calculate:

– The number of comments made overall

– The percentage of comments with a negative score

– The percentage of comments that have a negative polarity

– The number of comments posted to this subreddit

– The percentage of comments posted to this subreddit with a negative score

– The percentage of commented posted to this subreddit that have a negative polarity

Furthermore, we can use the Python package NLTK to find the words most frequently used in negative comments.

Lastly, we can repeat the process for the user’s submission to calculate the number of posts they have made to this subreddit. Once we have all this information, we can format it nicely inside some standard text to produce the following output:

It seems I have made my fair share of negatively worded comments but overall less than 2% have been downvoted negatively. It seems my comments that http://link to imgur.com are voted down the most.

Further work

I have shared the full code of my bot on Kaggle, it currently does not automatically post but before I deploy this I would need to ensure that the following issues are resolved.

Firstly, the quality of the Natural Language Processing needs to be assessed and, if possible, improved. This first stage is simply to demonstrate the idea and there is definitely room for improving the polarity measure.

Furthermore, if I am to consider deploying this, I would need to consider the possibility that this may be abused and called by users incorrectly. I would like to have a check system in place before this responds automatically to ensure that users are not using it to simply bully other users or for other nefarious reasons.

Lastly, I would like to perhaps add a bit more to the response including some sample phrases such as ‘This person doesn’t normally post negative comments to this subreddit, perhaps they are having a bad day.’ to add a bit more life to our bot.

I hope you enjoyed this idea and if you have any comments or suggestions please let me know.

Thanks

Phil

Leave a Reply