Can Machine Learning Keep Facebook’s User Data Safe?

August 30, 2018 - 3 minutes read

After Facebook’s unforgettable data scandal involving a rogue app developer, Cambridge Analytica, and the 2016 Presidential election, the social media company has allegedly been focusing on improving security, user permissions surrounding privacy, and limiting developer freedoms.

Located just outside of San Francisco, Facebook recently opened up user data to researchers interested in voters and elections. The company also deleted over half a billion fake accounts in 2018 alone. Now, Facebook is focused on developing machine learning (ML) as part of a multi-pronged approach to improve security.

Cue Machine Learning

Aanchal Gupta is the director of security at Facebook, and she says the company’s already used ML to identify and delete half a million fraud accounts targeting a financial scam. Although ML has trouble figuring out context, Facebook plans to improve that aspect of ML while applying its strengths. By using pattern matching, Facebook can also detect unusual login activity and alert you through an email.

Aanchal describes an example: “If we see, ‘This user always comes from California and she usually logs in at 9 p.m. PST and these are the IP addresses she logs in from,’ then when we see the same user logging in from a different country and at a very odd time, we see that their initial login was from California and then two hours later it’s from China. We know they couldn’t have flown that quickly from California to China.”

Adolescent Algorithms

Another form of pattern recognition helps to quickly identify and flag inappropriate content for removal before it spreads across the platform. However, in the case of inappropriate language, images, and posts, a team of humans still has to manually check to make sure the computer flagged something that’s actually disallowed.

Of course, Gupta stressed, machine learning algorithms take time to train and work accurately. Thus, human teams are a necessary component of early machine learning implementations, to hold the algorithm’s hand until it can work with a high accuracy on its own. Every correct or incorrect find of inappropriate content teaches the algorithm, so any data is good data. Using a manual team is also an indication of exactly how weak ML’s context identification problem is.

With Facebook’s huge global user population, lack of data is not a problem for the company. But with a constant barrage of fake accounts and duplicate user profiles, not every user is a treasure trove of data.

Do you still have a Facebook profile? What kind of things do you share with your friends on the platform? How do you feel about algorithms being trained with your profile? Let us know in the comments!