How To Keep Data Harvesters Away From Your App’s API

August 30, 2016

Your app is a vault, filled with valuable data. Custom content. User emails. Images. Videos. Losing any of these things could be “game over” for a mobile app developer.

The bad news is, all this data is readily available via your app’s APIs.

API data harvesting is surprisingly common as a “growth hacking” tactic — a startup that recently asked me to scrape news articles from a competitor’s app springs to mind. (Needless to say, the answer was a firm “no.”) Otherwise legitimate startups have no problem overlooking the ethical implications in their race to become the next big thing.

The good news is, there are simple steps you can take to stop them.

In this article I’ll explain how API scraping works, why it’s a problem for app developers, and how to protect your data from prying eyes.

Why app APIs are vulnerable

Contrary to popular belief, native apps are just as vulnerable to hackers as their browser-based counterparts. The thing that makes them so vulnerable is the API ecosystem.

APIs are a double-edged sword. Efficiently sharing data between software is part of what’s made the mobile industry so successful and disruptive. Unfortunately, APIs are also a hacker’s dream, since they’re designed to share back-end server content without human intervention.

Most app data theft happens because of flaws in API setup that allow a hacker — rather than another app or user — to access back-end databases.

How API data scraping works

Let’s look at an example:

A hacker trying to access your app’s data is essentially trying to imitate legitimate web calls to “fool” the API into sharing data with them.

Your app naturally shares data with users and other apps — the hacker just needs to convince the database that their requests aren’t malicious.

The first thing they need to accomplish this are the web calls used when the app accesses the back-end API. This is remarkably straightforward — just a question of setting up a proxy like Burp Proxy and changing the proxy settings in the phone or tablet so all web traffic flows through the desktop. This creates a “window,” displaying data going back and forth between the app and back-end APIs.

Once they get their hands on the web calls, they can write a script to imitate those requests. (For a decent programmer, this would take less than an hour.)

If the app developer has been foolish enough to use easily guessed user IDs (001, 002, 003…), cycling through those IDs to access every single user takes minutes. Even if IDs are random, it takes just a little extra effort to code a script that navigates the app programmatically (just like a web scraper crawls websites following links).

These same principles apply to all kinds of app data and content. If your app provides hotel information, a hacker could use this method to steal your database of custom hotel descriptions and sell them to a competitor. A recipe app could steal recipes from similar startups. Payment info? Why not. The possibilities are endless.

So — how can app developers guard against this?

How to protect your API data

SSL Pinning: This simple precaution is surprisingly absent in many apps — which is too bad, since it makes setting up a proxy difficult or impossible for entry-level hackers.

You can envision SSL pinning as a “tighter” verified connection between the app and back-end APIs, stopping the proxy server from transferring data in the middle.

Randomized IDs: Sometimes a hacker can work around SSL pinning by installing the app on a rooted device and using tools like Snoop-It“ and Cycript. Randomized IDs guard against scraping by making it difficult to scrape quickly once a hacker is ”inside.”

If a hacker is able to acquire a user’s data with a query like “GET /api/user/123/profile,” chances are they can cycle 000–999 without throwing an error. But when the developer uses randomized IDs that don’t follow an ordered pattern, hackers are stuck navigating the app manually — which might not be possible.

Intelligent UX: Making an app that’s simple enough for human users to navigate yet complex enough to confuse crawlers can be quite challenging. Hackers and UX designers have been in a bit of an arms race over this since the dawn of mobile.Throwing up CAPTCHAs, as many websites do, would certainly be effective — but would almost certainly drive user churn. The trick is to build manual actions into the app, so that sensitive data is only revealed or queried if a human user makes an unmistakably human action.

In some cases, that could mean opening a drop-down, or even dragging one element to another part of the screen. Unfortunately, unnecessary user actions tend to feel like just that — unnecessary user actions.

Working around this dilemma is just one reason that UX designers and security experts should work closely in app development studios.

Rate APIs: A user swiping through a dating app might view a profile every second, but it’s physically impossible for them to view 200 profiles a second. (Some Tinder users might disagree.)

Navigating an app rapidly, making rapid API requests in the process, is exactly the sort of thing an API scraping script will do.

So, app developers should set their APIs to block users when queries rise above a certain rate or reach a certain number within an hour.

The big picture: APIs and ethics

It’s not just hackers that use API scraping techniques like this, and the target of app attacks aren’t always the user emails or payment information you might expect. User-generated reviews, for example, can be gold for e-commerce apps. Recipes could be scraped for selling to another cooking startup.

When you’re purchasing any sort of content for a new startup, be wary of where that content came from. Custom content, just like custom code, is a top expense for Los Angeles iPhone app developers. When the price is low, it’s a fair bet the source isn’t original.

Every piece of data in your app was expensive to acquire. Every time a hacker manages to steal it, the value of your startup diminishes.

I recommend using these techniques to protect it.

How To Keep Data Harvesters Away From Your App’s API

Why app APIs are vulnerable

How API data scraping works

How to protect your API data

The big picture: APIs and ethics

In the Press

Clutch Recognizes Dogtown Media as a Top Global B2B Company for 2021

Clutch Recognizes Dogtown Media as a 2021 B2B Leader in Artificial Intelligence for Robotics

Dogtown Media Is Named a Top Machine Learning Developer of 2021 by Techreviewer!

Digital.com Recognizes Dogtown Media as One of 2021’s Best iPhone App Developers!

Dogtown Media Recognized as a Top Wearable App Developer of 2020 by TopDevelopers.co!

Dogtown Media Supports Connected Health Initiative’s Request for Biden-Harris Administration to Combat COVID-19 With Digital Health Tech

Recent Posts

Gamification in Pediatric Oncology Apps: Boosting Chemo Adherence Without Exploitative UI Mechanics

Integrating Real-Time Local Air Quality Index (AQI) and Pollen APIs to Trigger Proactive Asthma Flare Notifications for Mobile Apps

How to Sync Daily Biometric Vitals (Weight Spikes, SpO2) to Trigger Automated Heart Failure Interventions for Mobile Apps

How to Start a Healthcare App: From Idea to MVP

How to Build an FDA-Compliant Physical Therapy App Utilizing Real-Time Kinematic Body Tracking APIs

Software as a Medical Device (SaMD): Definition, Regulation, and Development