Skip to content

Conversation

@albertzaharovits
Copy link
Contributor

@albertzaharovits albertzaharovits commented Nov 29, 2018

Cloud providers (ms, google, aws) have two ways to offer client side encryption (besides the encryption at rest and TLS in transit):

  1. Using Customer Managed Keys
  2. Using Customer Supplied Keys

The distinction between the two is whether the "key encryption key" is generated and stored locally (2) or managed by the cloud provider (1). They all use envelope encryption, encrypting data with randomly locally generated symmetric keys, and then encrypting this key with another key that can be managed by the cloud's provider key vault service (1) or be generated locally (2). The encrypted symmetric key is usually stored alongside the object, in it's metadata, together with the identifier name of the key required to decrypt it. This mechanism has the benefit that during a key rotation the decrypt and re-encrypt operations can be carried out over the data encryption keys only (not re-encrypting the data)

Implementation wise I think we should address both ways, although this PR only deals with (1).
Variant (1) allows for hustle free key rotation - you do it from the cloud provider's management console and you don't even need to change the setting value on ES (you can define automatic policies for this).
Variant (2) does all the encryption locally but rotation implies changing the setting on the ES cluster.

Encryption is done at file level (blob) and for all the files.

I have opened this to garner early feedback and concerns. I intend to push forward with implementing both variants + documentation.

Related #34454 #11128
Sibling of #30513

CC @elastic/es-security

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@tlrx
Copy link
Member

tlrx commented Dec 3, 2018

It looks very promising, thanks for taking care of this.

I'd like to double check few things though:

  • are object names also encrypted? (we don't want that)
  • are vendor SDKs already supporting these features? Looks like the GCP one is in beta, do we really want to support it?

I'm also a bit concerned about our capacity to test these features correctly through third party tests. We are already lacking of tests on Azure so maybe we could just exclude the provider from the scope of this feature.

@albertzaharovits
Copy link
Contributor Author

Hi Tanguy,

are object names also encrypted?

No they're not. In particular the client does not require the encryption key to list a path.

are vendor SDKs already supporting these features? Looks like the GCP one is in beta, do we really want to support it?

This is indeed marked beta on this client SDK, but I've looked at the code for the latest one and it is not marked beta anymore.

I'm also a bit concerned about our capacity to test these features correctly through third party tests.

This is an important problem indeed that Yannick also raised in a private discussion. I have some hopes that for GCS we might test for encryption in some qa test. I haven't looked carefully into it yet.
If we can test for it in qa tests then I think we're in good shape. Otherwise we would require dedicated infrastructure to test this (the test would run in CI only and have valid GCS credentials). This would be a large burden, hence I have also raised the matter with the product team (Jason Zucchetto) to see how stringent is the need for such a feature and which provider and key management alternative is the best bet.

@tlrx
Copy link
Member

tlrx commented Dec 4, 2018

Great, thanks @albertzaharovits. It would be great to have qa tests for this, then 3rd party tests could be done as follow ups if they require infrastructure changes.

@albertzaharovits
Copy link
Contributor Author

@tlrx @ywelsch I have been unintentionally misleading you, I'm sorry!
While digging in the SDK code to see if I can intercept the calls for get-encryption-key so that we can feed into it inside integration tests, I have been stumped to learn that the sdk does not offer client side encryption . The customer managed or customer supplied variants are for server side encryption, the client sends the key identifier (or the encoded key, respectively) by the API header. I went back to the docs and I couldn't believe my eyes. It indeed does not 🤤 And they don't plan to add it anytime soon .

I had read the google sdk docs after reading the MS and AWS ones, and they use similar terminology and I fell prey to wishful thinking (MS Azure Storage and AWS S3 do have client side encryption)

However, I am now thinking we should not use client side encryption with the client SDK. The reason being that it is difficult to test it reliably. In integration tests, we would have to interpose in the communication protocol to intercept calls for encryption keys and check that data indeed looks encrypted with the key that was requested. This is testing someone else's feature. After a lot of effort it effectively binds us to a particular SDK client version, because these are sdk library internals that can easily change and we might not catch it in tests. In all truth this depends on the library SDK, from a glance I think it's easier to test for S3 because it exposes dependency objects that we can mock, instead of handling it inside the library.

BUT, what do you think of this feature at the BlobStoreRepository level, the base class of all repository implementations? I see it on the same level with the rate limiting and compression features. We would not be using any external library (JCE should be enough, we will know for sure when we decide on the algorithm). A basic implementation would just require a passphrase to generate a symmetric key and construct a CipherInputStream (possibly wrapped byRateLimitingInputStream, during snapshot). But we might be wishing to use asymmetric key encryption instead.

I foresee an important problem though. In order to rotate the keys we would need to have some kind of cmd line tool outside of ES to do the decryption with the old key and encryption with the new one.

WDY? Should I open a discuss email thread? Do you see any obvious thing I'm missing.

In any case I am going to try to see what I can achieve with the S3 client instead.
I'm going to politely try to hijack #30513 .

@ywelsch ywelsch added team-discuss :Security/Security Security issues without another label labels Dec 7, 2018
@jasontedor jasontedor added v6.7.0 and removed v6.6.0 labels Dec 19, 2018
@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
@original-brownbear
Copy link
Contributor

@tlrx @albertzaharovits is this PR something you want to continue?

@albertzaharovits
Copy link
Contributor Author

No, We'll use a different strategy without using this library's managed-client-encryption-key approach.
Thanks for the ping @original-brownbear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement :Security/Security Security issues without another label team-discuss WIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants