-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
This concerns the encryption of snapshot data before it leaves the nodes.
We have 3 types of cloud snapshot repository types: Google Cloud Storage, Azure Storage and Amazon S3. Amazon and Azure support client side encryption for their java clients, but Google does not.
Amazon and Azure, which support client side encryption, allow the keys to be managed by the client (us) or by their Key Management Service (Vault-like). They both use the Envelope Encryption method; each blob is individually AES-256 encrypted with a randomly generated (locally) key, and this key (Data/Content Encryption Key) is also encrypted with another Master Key (locally or by the Vault service) and then stored alongside the blob in its metadata. The envelope encryption facilitates Master Key rotation because only the small (D/C)EK key has to be re-encrypted, rather than the complete blob.
On the ES side we discussed on having a single fixed URN key handler at the repository settings level.
This URN identifies the Master Key; for example this could point to a key on the Amazon Vault Service or the Keys on each node's keystore. In this alternative it is not possible to rotate the keys via the repository API (it might be possible to do it outside ES, which is obviously preferable, but see below).
I believe this is the rough picture of the puzzle that we need to put together.
We oscillated between implementation alternatives, and I will lay out the one which I think is favorable. Whatever solution we initially implement, given that the Master Key identifier is an URN we can multiplex multiple implementations for the same repository type.
We mirror the Envelope Encryption algorithm, employed by Amazon and Azure, at the BlobContainer level. The key is stored on each node's keystore (and is pointed to by the repository level URN reference).
Advantages:
- Implement once for all cloud repository types and it will even work for the file system repository type!
- Testing! We can have unit tests for the base implementation of the ~
EncryptedBlobContainerand end-to-end integration tests in only one of the implementation (Amazon or FS), where we can decrypt the data on the service fixture.
Disadvantages:
- Duplicates code in one SDK
- We're in the open with Key Rotation. We might need to implement our own cmd line tool to rotate keys (download objects metadata, decrypt the key and re-encrypt it). Tool will be "easy" to implement.
- Does not support Vault keys.
In the opposite corner, there could be this alternative:
We use the AWS cloud library facility to implement it only for the S3 repository type. The key is stored either on the node's keystore or on the AWS Key Management Service.
Advantages:
- Easiest to implement
- Supports AWS's Vault Service
- We might have support for key rotation, using amazon's command line tool
Disadvantages:
- Only S3 repository type is supported
- Testing. We either mock the client and check that the code indeed calls the "crypto" APIs or we do an end-to-end integration test, where we decrypt the data on the fixture. Either way we kinda "test the library" rather than our code. This is pointless and brittle, but we need testing because the risks are too great.
EDITED 28.01.2021 Backlog:
- Add the new API that changes the password for a given encrypted repository.
The encrypted repository must already be configured with the correct password. The API iterates over all the associated
wrapped DEKs (contained inside a dedicated blob container under the repository's base path), and proceeds to unwrap and re-wrap all the DEKs with the new password.
Finally, the old DEKs are removed, so that the old password cannot be used any longer. - implement searchable and encrypted repositories.
This mainly requires implementing the AbstractBlobContainer#readBlob interface. This is slightly problematic because the association id between an encrypted blob and its DEK is prepended at the beginning of the blob, so that decryption at an internal position currently requires a seek at the beginning. Double check that the definition structure is reasonable (ie. is it searchable and encrypted or vice-versa, ping David about this). - ensure compressed & encrypted repositories work
- investigate metered and encrypted repositories
- thoroughly test failure scenarios where IOExceptions are thrown. Generally speaking, (although nor really true in practice) reads and writes contain
two operations that can fail independently. Make sure testing covers this. - create benchmarks (distributed team is working on something that measures the throughput of a repository)
- permit (and test) encrypted HDFS repositories (it should work)
- double check that AbstractBlobContainer#blobExists and EncryptedBlobContainer#listBlobs/listBlobsByPrefix for EncryptedBlobContainer can return
trueand then reading to fail because of decryption problems - double check that we're not relying on system encoding (that strings are always written UTF-8 encoded, and reads are always decoded with the same UTF-8)
- ensure that there's no problem that EncryptedBlobContainer#writeBlobAtomic is not atomic (in general, it cann't be because a write might also generate and write the DEK, so there are two operations that can fail independently)
- ensure that it's alright for EncryptedBlobContainer#listBlobs/listBlobsByPrefix to return the encrypted blob size (which is larger), instead of the expected blob size of the decrypted blob
- think about possibly renaming "password name" to "password label"
- investigate repository password situation on cloud.
On-premise repository passwords are cached in memory when the node starts, usually requiring a node restart when configuring a new snapshot repository. The security settings implementation on cloud is different, so that maybe we can read the repository passwords immediately after they've been added, changed, without requiring a restart. - repository password min-length limit (an encrypted repository with a short password )
- make KDF parameters configurable
- investigate the naming of the delegating and delegated repositories (they are the same currently, is this a problem?)
- make crypto provider selectable for operations on the client side-encrypted repo
- test that encrypted repos can share the bucker (but different base path, otherwise we already test that the passwords must be the same) and can also share the repository client
- Revisit definition of password name in repository settings (see: Encrypted blob store reuse DEK #53352 (comment) )
- Settle on the specification for encrypted and searchable snapshots (ping David about it)
- Investigate if versioning if individual encrypted blobs is necessary https://github.com/elastic/elasticsearch/pull/53352/files#r444383568
- Investigate if we can guarantee that DEKs do not change inside a given shard
- Report encryption stats (from Encrypted blob store reuse DEK #53352 (comment))
- test FIPS negative behaviour (that a short password doesn't crash the node or something).