For years Apple has had a long commitment towards privacy not shared by many of its competitors. While Google and Microsoft are happy to suck up personalized data which hackers and the government could exploit, Apple has refused to do so. As an example, Apple announced at its Worldwide Developers Conference that all iOS apps must encrypt web communications by the end of the year.
But Apple needs data in order to personalize its services and know what adjustments their customers want, so on Tuesday Apple senior vice president of software engineering Craig Federighi discussed a concept called differential privacy which will be in iOS 10 software.
According to Apple differential privacy will “help discover the usage patterns of a large number of users without compromising individual privacy.” The idea is that while Apple can see user data in the aggregate to improve its services, it will be impossible for anyone to find data about any one user. This includes Apple itself, as well as hackers and governments.
The problems with privacy
How is it possible to get data in the aggregate but not at the individual level? In order to understand that we need to start with the challenges behind protecting user privacy.
Most companies do make some effort to protect your privacy, and they will often anonymize your data and refuse to publish personal information. But people can use what data is revealed to figure out your personal data.
It is comparable to finding out an Internet forum user’s real-life identity. You won’t have their real name or phone number, but you can note that the forum user lives in New York and went on a date at this restaurant. By using facts like these you can narrow it down until you can discover their true identity. And as Wired pointed out, researchers were able to do something like this in 2007 when Netflix published a list of “anonymous” customers.
This shows that even if a company tries to hide personal information, hackers can use the information they do have to glean personal data. And if the company tries to hide all the information they have, then they cannot use it on their end.
But what if all the information is obscured?
The idea behind Differential Privacy
That is what differential privacy sets out to do. It works by algorithmically obscuring the data with noise so that hackers can never truly figure out what any one person said.
A lot of the ideas behind differential privacy are theoretical, worked out by tech scientists and cryptologists. But Cynthia Dwork, the co-inventor of differential privacy according to Engadget, gives an example of how it could work, using a surveyor who asks someone whether they have cheated on an exam:
Before responding, the person is asked to flip a coin. If it’s heads, the response should be honest but the outcome of the coin shouldn’t be shared. If the coin comes up tails, the person needs to flip a second coin; if that one is heads, the response should be “yes.” If the second is tails, it’s “no.”
Since a coin over the long run should come up head or tails about fifty percent of the time, the surveyor can roughly guess how many people actually did cheat on their exam over the aggregate. But if a malicious agency finds out that one particular individual answered “yes,” he has no idea if that is because the individual cheated on the test or because he said so after getting a tails and then heads on his coin flip.
Actual differential privacy algorithms are much more complicated but would be similar to the coin flip example. By creating mathematical “noise” to obscure individual data, it is impossible for anyone to know any one data point even if he knew the algorithm.
Differential privacy could mean that Apple and other companies could get data which helps them while protecting their customers’ privacy. But the fact is that much of the work surrounding differential privacy has been largely theoretical, and there have been no small-scale tests of how it might work.
Implementing it on a large scale, like Apple plans to do with iOS, without small-scale trials is risky.
However, differential privacy is not nearly as useful on a small scale. The mathematical noise will more heavily obscure the data in a small sample size, increasing the chances of entirely inaccurate data. Think about the above coin example. If the surveyor only surveyed 10 people, it is possible that eight people could have flipped “tails,” and his survey would be worthless. But if he surveyed 10,000, it is far less likely that 8,000 people flipped “tails”, and thus he can better trust his data.
Differential privacy is a hard-to-understand concept. But if Apple is successful, it could seriously change how companies acquire data. While there will be companies happy to take user data, the fact that there is a way to collect data without affecting individual privacy could have huge effects between company and customer.