The ability to keep data encrypted while you use it for computations in the cloud could protect data from attackers and malicious insiders alike. There is still a performance hit, but you can start using open-source libraries to take advantage of that.
Cloud computing on a certified, compliant, properly-run cloud service like Microsoft Azure is likely to be far more secure than on-premise servers in your office or your data centre. Your data is encrypted at rest and in motion; cloud systems are probably patched more often and configured more securely than your servers; and admin access is locked down and only enabled for 'just enough access, just in time' to run specific commands within specific time windows. Also, the admins will have gone through background checks and work in secure locations that require biometric credentials to access.
There are still problems, though. You have to trust that the cloud service is storing and managing your data securely and not letting its admins or any third parties get hold of it. Your own admins have privileged access to the data on that cloud, so you need to protect against insider threats. And when you want to actually use that data -- for AI, analytics or just querying a database -- if you're using traditional encryption, either it needs to be downloaded and decrypted first, or you need to store your encryption keys in the cloud.
SEE: Special report: A winning strategy for cybersecurity (free PDF) (TechRepublic)
That can be a problem for confidential or privileged information (whether that's personally identifiable information protected by legislation like GDPR, financial records or medical data), especially if you're trying to use data that's been shared with you by another organization that controls the encryption. Services like Azure Data Share let organizations manage and control data sharing with partners, but the data is only encrypted in transit and at rest.
Homomorphic encryption preserves the mathematical structures that underlie the encrypted data, so you can do computation on the data without decrypting it. If the homomorphic function encrypted 400 as 4 and 200 as 2, you could divide the encrypted numbers by each other and get the same result as dividing the unencrypted numbers.
The encryption in actual homomorphic schemes' functions are considerably more complicated: often they use a mathematical calculation involving lattices with a high number of dimensions, called the Ring-Learning With Errors (RLWE) problem. That's at least as secure as standard encryption schemes, but unlike current methods, it's not something that quantum computers will be able to break.
The result of the calculation is also encrypted. That makes it useful for protecting workloads you're running in the cloud, for doing aggregate analytics in the cloud (where you're looking across large amounts of data, rather than the specific details of one thing), for automation and orchestration where encrypted data could trigger an event, and particularly for handling supply chain and partner information from outside your own organization.
If you're working with a partner or supplier, you can share encrypted data and only get access to the data that you have in common, so you can use your combined datasets for machine learning. Or organizations could collect information from customers that's already encrypted, where the customer owns the encryption key and still work with it, without ever seeing the actual data. Because homomorphic encryption can now be used with deep-learning algorithms, future versions of cloud machine-learning offerings like Azure Cognitive Services could operate over encrypted data, whether that's translating a confidential contract or OCRing medical records, or analysing genetic information to see if someone is at risk for a heart attack, without leaking information.
Anonymizing data isn't enough to protect it: once you start working with large amounts of data, correlations or user errors make it likely that data will be re-identified, accidentally or on purpose -- but that can't happen if it's never decrypted.
It's a grand vision: how much of that can you do today?
Making homomorphic encryption practical
Homomorphic encryption isn't a new idea, but it has taken some time to become practical. Originally proposed in 1978, there wasn't even a theoretical algorithm for it until 2009 -- and that would have taken a trillion times longer than an unencrypted calculation. By 2013 IBM Research got that down to a million times slower, so a data operation that would take one second without encryption would still take 12 days with homomorphic encryption.
Microsoft Research took a slightly different approach, specifying some of the parameters for queries in advance, like the size of the data set, the specific fields that will be needed, limits on the data range in those fields (so an age field won't be negative or greater than 150, say) or how much computation you're going to do. That 'practical' homomorphic encryption is the basis of the Microsoft Research Simple Encrypted Arithmetic Library (SEAL), which it open-sourced in 2018.
SEAL is a C++ library with.Net Standard wrappers for C#, and it works on Windows, macOS and Linux. It's starting to get built into higher-level frameworks: Intel's HE-Transformer uses SEAL to do computation on encrypted data with the Nervana nGraph neural network compiler and frameworks like TensorFlow, for example. SEAL also includes demos to show you how to build it into applications: so far that covers using it with Azure Functions and in an Android app for tracking exercise. (Yes, that could be confidential data: several cloud exercise-tracking sites have ended up leaking information from user account details to the location of military bases.)
You can't run arbitrary computations, but specifying the parameters means that, while there's still a significant overhead, rather than operations being 12 orders of magnitude slower than working with unencrypted data, it's only three or four. In 2015, neural networks using homomorphic encryption (which Microsoft calls CryptoNets) could recognize handwritten numbers with 99% accuracy at a rate of nearly 60,000 an hour (on a 3.5GHz Xeon PC running Windows 10). A single prediction from the OCR CrypoNet took 250 seconds, but so did 4,096 predictions, because homomorphic encryption is massively parallel.
On a desktop PC with a single-threaded Core i7, depending on the amount of computation the parameters specify you're going to do, multiplying numbers with SEAL takes anywhere from around 500 microseconds to 105 milliseconds. But when you average that out over all the numbers you can process in parallel, the time comes back down to nanoseconds per operation. Similarly, selecting data from a large encrypted data set takes seconds rather than days. Currently SEAL only uses the CPU, but because it's so parallelizable, adding GPU acceleration (which is on the roadmap) should improve performance by around two orders of magnitude.
SEE: The Dark Web: A guide for business professionals (free PDF) (TechRepublic)
That makes it practical, but SEAL is still more for targeted problems and small amounts of confidential information than handling all your data. Microsoft is using homomorphic encryption in its ElectionGuard system for end-to-end verification of voting. For more mainstream performance, SQL Server Always Encrypted simulates full homomorphic encryption on top of standard encryption by using trusted hardware.
The cryptography community also tends to need time to adopt new encryption techniques, to understand the issues and to work through standards. Typically that takes about ten years, and homomorphic encryption is starting to move through the standards process. Currently, the discussion is about specifications to say just how strong the encryption is; the next step will be getting a common set of APIs so that homomorphic encryption systems can interoperate.
Today, homomorphic encryption is something you're going to think carefully about applying, although it's worth doing if it lets you work with targeted data you otherwise couldn't use. But once we see standardisation and hardware acceleration, it's likely to become much more broadly adopted in the not too distant future, given how much confidential data organizations want to work with.