Can Machine Learning Clean up Facebook?

December 26, 2018 - 9 minutes read

Want to see what benefits AI can bring to your organization? Check out our Free Machine Learning Whitepaper right now!

2018 hasn’t exactly been a stellar year for Facebook. The San Francisco-based social media giant has had to deal with a bombardment of scandals and accusations ranging from fostering Russian election interference to being used as a tool for genocide in Myanmar. And more recently, it was discovered that the company was letting corporate partners read the private messages of users.

Can machine learning (ML) development help the now-infamous platform clean its act up and regain the trust of consumers?

Apologies With AI

Marzuki Darusman, chairman of the Independent International Fact-Finding Mission on Myanmar told reporters that Facebook played a “determining role” in the recent attacks on Rohingya Muslims in the country. A UN fact-finding report reached similar conclusions; the social media platform has an extensive history of inciting violence in Myanmar.

Before this story broke, it was already suspected that Facebook also played a role in helping Russia meddle in the U.S. presidential election of 2016. But to what extent was still a mystery. A recent Senate report revealed that Russia’s efforts were much greater than previously assumed and that it was even possible Facebook was downplaying the entire event.

During Facebook CEO Mark Zuckerberg’s visit to Washington, D.C. last April, Senator Jeff Flake inquired about how the company would avoid being an accessory to harmful behavior in the future. Zuckerberg turned to a topic he would then go on to mention more than 30 times during his two days of congressional hearings: artificial intelligence (AI).

“Over the long term, building AI tools is going to be the scalable way to identify and root out most of this harmful content,” Zuckerberg explained. With AI, Facebook could finally fight fake news, detect inappropriate content, and prevent propaganda from spreading. Basically, AI would be a panacea for all maladies plaguing the platform.

And the company honestly doesn’t see any other option, according to Mike Schroepfer, Facebook’s CTO. “I think most people would feel uncomfortable with that,” says Schroepfer. He’s referring to the impossible (and creepy) alternative of having human reviewers check everything. “To me, AI is the best tool to implement the policy—I actually don’t know what the alternative is.”

Automating Content Moderation

Facebook’s initiative to automate content moderation with AI began long before all of the platform’s recent controversies. In 2014, Tanton Gibbs was brought on as an engineering director to focus on ad technology development.

But once he became aware of the content moderation challenges facing the company, his objectives changed. “They were strictly using humans to review reports for things like pornography, hate speech, or graphic violence. I saw we should automate that.”

Gibbs became head of a new Seattle-based team called CareML. And by embracing deep learning, a machine learning subset that has become much more versatile in recent years, CareML quickly proved its worth. Gibbs’ group developed algorithms to recognize sexually explicit content. Today, those algorithms are used to automatically detect and remove 96 percent of explicit images on the platform before anyone even reports them.

To put this into perspective, Facebook claims it removed 30.8 million explicit images in Q3 of 2018 alone. In turn, this means the algorithms failed to identify and remove 1.3 million images. While small in terms of ratio, these photos have garnered nearly twice the number of views they usually do just in the last few months. “More nudity was posted on Facebook, and our systems did not catch all of it fast enough to prevent an increase in views,” the company said in a recent community standards enforcement report.

Still, the efforts of Gibbs’ group remain a bragging point for Facebook’s executives. It serves to show that machine learning can help protect the users and integrity of the platform. Unfortunately, words seem to be a bigger obstacle.

The Trouble With Text

Deep learning algorithms are remarkably good at understanding and sorting images. But words present a different type of difficulty. In fact, algorithms are still unable to understand the context of simple text that would pose no challenge to humans. For example, the phrase “I’m going to beat you so bad” could give even the best AI some trouble.

Facebook claims that a little over half of all hate speech removed from its platform over the last three months was initially flagged by algorithms. The posts are still reviewed by people to confirm their removal though. But complete automation of this process isn’t the goal, at least for now. The company just wants its AI to work effectively enough that its team of 15,000 human reviewers are not overwhelmed.

That’s a tall order when considering that Facebook is available in over 100 countries around the world. Still, with risks like possibly helping genocide or sabotaging political processes on the line, it’s an important endeavor in need of attention. “Many companies and people in academia are realizing this is an important task and problem, but the progress is not that satisfying so far,” says Ruihong Huang, a Texas A&M University professor. “The current models are not that intelligent in short, that’s the problem.”

Srinivas Narayanan leads the engineering behind Facebook’s Applied ML group. And he wholeheartedly agrees with Huang: “I think we’re still far away from being able to understand that deeply. I think machines can eventually, but we just don’t know how.”

Striving for Better AI

While improving AI to tackle text may feel like an impossible problem, that doesn’t mean Facebook’s giving up. The company has a big multinational AI laboratory working full-time on research that may one day make this viable.

One of the fruits of this labor is Rosetta, a large-scale AI system that reads out text in content like images and videos so that it can be fed to hate speech detectors for evaluation. Another initiative focused on improving Facebook’s image recognition capabilities by training AI with hashtags from Instagram.

Because machine learning needs to be trained on narrow, specific datasets, the company has also changed some of the ways its human moderators work in order to capture information that’s more relevant for the algorithms to learn from. In order to circumvent the challenge of gathering pertinent data for each language that Facebook is available in, the company is also adapting existing systems for common languages like English to improve its systems for more obscure ones, like Romanian.

Whether these efforts pan out to eventually improve the integrity of Facebook’s content is a question that only the future can answer. Schroepfer remains optimistic that they will help drive down the occurrence of hate speech and other explicit content: “My hope is that it in two or three or five years there is so little of it on the site that it’s sort of ridiculous to argue that’s it having a big effect on the world.”