Always Encrypted: Deterministic vs. Randomized
Introduction
When it comes to securing a SQL server there are severeal things we need to think of. One of this things is the protection of our data in transit and/or at rest by using encryption.
Encryption can happen at several layers. E.g. for encryption on a drive level we would use BitLocker or NTFS encryption for folder level and TDE to encrypt on a file level.
At the same time we can encrypt columns and scalar values using T-SQL via a certificate (EncryptByCert
), asymmetric key (EncryptByAsymKey
), symmetric key (EncryptByKey
) or a passphrase (EncryptByPassPhrase
).
Now how does Always Encrypted fit in here and how does it work?
Always Encrypted is a feature designed to protect sensitive data, stored in Azure SQL Database or SQL Server databases from access by database administrators. It leverages client-side encryption where a database driver inside an application transparently encrypts data, before sending the data to the database. It therefor provides a seperation between those who own the data and those who manage the data but should have no access.
This feature is available in all editions of Azure SQL Database, starting with SQL Server 2016 (13.x) and all service tiers of SQL Database.
This is in contrast to the T-SQL functions mentioned beforehand, where the encryption and decryption process happens on the database engine. When using Always Encrypted the keys used by the client-side are never revealed to the DB engine.
This leads us to the types of keys involved, which are: Column Encryption Keys (CEK)
and Column Master Keys (CMK)
. The CEKs are used to actually encrypt the data and a CMK is required to protect the CEKs itself. This is necessariy because the CEKs are stored in the database and therefor need special protection. CMKs are usually held in an Azure Key Vault, Windows Certificate store or a hardware security module.
It's important to note that the database only contains metadata about the type and location of CMKs, and encrypted values of CEKs. This means that plaintext keys are never exposed to the database system ensuring that data protected using Always Encrypted is safe, even if the database system gets compromised.
Okay, now that we have covered the basics lets move on to the provided encryption types deterministic
and randomized
and compare them to each other.
Deterministic Encryption
I think that the deterministic encryption type is somewhat similar to the way hashing algorithms work. This method always generates the same encrypted value for any given plaintext value.
This means if we would encrypt a boolean column holding only true
values the encrypted payload would be the same for each of the values.
Input | Output |
---|---|
true | 0x1234ffff |
true | 0x1234ffff |
false | 0xffff0000 |
false | 0xffff0000 |
This method allows grouping, filtering by equality and joining tables based on encrypted values, but could also allow a malicious user to guess information by examining patterns and then deduce the plain-text value.
Use the deterministic encryption method when...
- values are mostly unique
- column values needs to be used as searching or grouping parameters
- column requires an index
Randomized Encryption
Randomized encryption generates a different encrypted value for the same plaintext each time. So if we would encrypt a boolean column holding only true
values, we would get a different encrypted payload for each of them.
Input | Output |
---|---|
true | 0xffffabcd |
true | 0x1234dddd |
false | 0xa1a1b2b2 |
false | 0xccdd1234 |
This encryption method is more secure then deterministic, but prevents equality searches, grouping, indexing and joining on encrypted columns.
Use the randomized encryption method when...
- values are forming a pattern (e.g. for enums, booleans, ...)
- there is no need to group or join tables based on this data
- there is no need for an index