Client Side Encrypted Snapshot Repositories

This concerns the encryption of snapshot data before it leaves the nodes.

We have 3 types of cloud snapshot repository types: Google Cloud Storage, Azure Storage and Amazon S3. [Amazon](https://aws.amazon.com/articles/client-side-data-encryption-with-the-aws-sdk-for-java-and-amazon-s3/) and [Azure](https://docs.microsoft.com/en-us/azure/storage/common/storage-client-side-encryption-java) support client side encryption for their java clients, but [Google](https://cloud.google.com/storage/docs/encryption/client-side-keys) does **not**.

Amazon and Azure, which support client side encryption, allow the keys to be managed by the client (us) or by their _Key Management Service_ (Vault-like). They both use the [Envelope Encryption](https://cloud.google.com/kms/docs/envelope-encryption) method; each blob is individually AES-256 encrypted with a randomly generated (locally) key, and this key (Data/Content Encryption Key) is also encrypted with another _Master Key_ (locally or by the _Vault_ service) and then stored alongside the blob in its metadata. The envelope encryption facilitates _Master Key_ rotation because only the small _(D/C)EK_ key has to be re-encrypted, rather than the complete blob.

On the ES side we discussed on having a _single fixed URN key handler_ at the repository settings level.
This URN identifies the _Master Key_; for example this could point to a key on the Amazon Vault Service or the Keys on each node's keystore. In this alternative it is not possible to rotate the keys via the repository API (it might be possible to do it outside ES, which is obviously preferable, but see below).

I believe this is the rough picture of the puzzle that we need to put together.

We oscillated between implementation alternatives, and I will lay out the one which I think is favorable. Whatever solution we initially implement, given that the _Master Key_ identifier is an URN we can multiplex multiple implementations for the same repository type.

We mirror the _Envelope Encryption_ algorithm, employed by Amazon and Azure, at the `BlobContainer` level. The key is stored on each node's keystore (and is pointed to by the repository level URN reference).

Advantages:
* Implement once for all cloud repository types and it will even work for the file system repository type!
* Testing! We can have unit tests for the base implementation of the ~`EncryptedBlobContainer` and end-to-end integration tests in only one of the implementation (Amazon or FS), where we can decrypt the data on the service fixture.

Disadvantages:
* Duplicates code in one SDK
* We're in the open with Key Rotation. We might need to implement our own cmd line tool to rotate keys (download objects metadata, decrypt the key and re-encrypt it). Tool will be "easy" to implement.
* Does not support _Vault_ keys.

In the opposite corner, there could be this alternative:
We use the AWS cloud library facility to implement it only for the S3 repository type. The key is stored either on the node's keystore or on the AWS Key Management Service.

Advantages:
* Easiest to implement
* Supports AWS's Vault Service
* We _might_ have support for key rotation, using amazon's [command line tool](https://aws.amazon.com/blogs/security/how-to-encrypt-and-decrypt-your-data-with-the-aws-encryption-cli/)

Disadvantages:
* Only S3 repository type is supported
* Testing. We either mock the client and check that the code indeed calls the "crypto" APIs or we do an end-to-end integration test, where we decrypt the data on the fixture. Either way we kinda "test the library" rather than our code. This is pointless and brittle, but we need testing because the risks are too great.

Relates https://github.com/elastic/elasticsearch/issues/34454 https://github.com/elastic/elasticsearch/pull/40416

----

**EDITED 28.01.2021 Backlog:**

* [ ] Add the new API that changes the password for a given encrypted repository.
The encrypted repository must already be configured with the correct password. The API iterates over all the associated
wrapped DEKs (contained inside a dedicated blob container under the repository's base path), and proceeds to unwrap and re-wrap all the DEKs with the new password.
Finally, the old DEKs are removed, so that the old password cannot be used any longer.
* [ ] implement searchable and encrypted repositories.
This mainly requires implementing the AbstractBlobContainer#readBlob interface. This is slightly problematic because the association id between an encrypted blob and its DEK is prepended at the beginning of the blob, so that decryption at an internal position currently requires a seek at the beginning. Double check that the definition structure is reasonable (ie. is it searchable and encrypted or vice-versa, ping David about this).
* [ ] ensure compressed & encrypted repositories work
* [ ] investigate metered and encrypted repositories
* [ ] thoroughly test failure scenarios where IOExceptions are thrown. Generally speaking, (although nor really true in practice) reads and writes contain
two operations that can fail independently. Make sure testing covers this.
* [ ] create benchmarks (distributed team is working on something that measures the throughput of a repository)
* [ ] permit (and test) encrypted HDFS repositories (it should work)
* [ ] double check that AbstractBlobContainer#blobExists and EncryptedBlobContainer#listBlobs/listBlobsByPrefix for EncryptedBlobContainer can return `true` and then reading to fail because of decryption problems
* [ ] double check that we're not relying on system encoding (that strings are always written UTF-8 encoded, and reads are always decoded with the same UTF-8)
* [ ] ensure that there's no problem that EncryptedBlobContainer#writeBlobAtomic is not atomic (in general, it cann't be because a write might also generate and write the DEK, so there are two operations that can fail independently)
* [ ] ensure that it's alright for EncryptedBlobContainer#listBlobs/listBlobsByPrefix to return the encrypted blob size (which is larger), instead of the expected blob size of the decrypted blob
* [ ] think about possibly renaming "password name" to "password label"
* [ ] investigate repository password situation on cloud.
On-premise repository passwords are cached in memory when the node starts, usually requiring a node restart when configuring a new snapshot repository. The security settings implementation on cloud is different, so that maybe we can read the repository passwords immediately after they've been added, changed, without requiring a restart.
* [ ] repository password min-length limit (an encrypted repository with a short password )
* [ ] make KDF parameters configurable
* [ ] investigate the naming of the delegating and delegated repositories (they are the same currently, is this a problem?)
* [ ] make crypto provider selectable for operations on the client side-encrypted repo
* [ ] test that encrypted repos can share the bucker (but different base path, otherwise we already test that the passwords must be the same) and can also share the repository client
* [ ] Revisit definition of password name in repository settings (see: https://github.com/elastic/elasticsearch/pull/53352#discussion_r409314119 )
* [ ] Settle on the specification for encrypted and searchable snapshots (ping David about it)
* [ ] Investigate if versioning if individual encrypted blobs is necessary https://github.com/elastic/elasticsearch/pull/53352/files#r444383568
* [ ] Investigate if we can guarantee that DEKs do not change inside a given shard
* [ ] Report encryption stats (from https://github.com/elastic/elasticsearch/pull/53352#discussion_r432261031)
* [ ] test FIPS negative behaviour (that a short password doesn't crash the node or something).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Client Side Encrypted Snapshot Repositories #41910

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Client Side Encrypted Snapshot Repositories #41910

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions