Skip to content

Conversation

@kellyzly
Copy link
Contributor

@kellyzly kellyzly commented Apr 1, 2015

Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects. It provides JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. We can directly use Chimera to implement spark encrypted shuffle.

core/pom.xml Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not generally needed.

@vanzin
Copy link
Contributor

vanzin commented Apr 7, 2015

Left some initial comments. I like the idea of using a library for this, my main concern is about the stability of that library (and who'll be maintaining it going forward).

Aside from that, there are some parts of the code that need cleaning up and some style updates, and some broken error handling, but nothing too major.

@kellyzly
Copy link
Contributor Author

kellyzly commented Apr 9, 2015

@vanzin :

Left some initial comments. I like the idea of using a library for this, my main concern is about the stability of that library (and who'll be maintaining it going forward).

i have updated the code according to your valuable suggestions. Chimera is a project which is maintained by Intel team(My teammate dianfu is the contributor of it). Chimera is a project which strips code related to the encryption and decryption part of Transparent Encryption in HDFS from hadoop. Transparent Encryption is imported since hadoop 2.6 and the code is stable after hadoop 2.6 release. Although the current version of Chimera is 0.0.1, it is same stable as current hadoop because we just strip code from current trunk of hadoop. If someone reports new bugs to hadoop about the encryption and decryption part later, my teammate will watch it and patch the patches of those bugs to Chimera.

More information about encrypted shuffle you can reference https://issues.apache.org/jira/browse/SPARK-5682. Any suggestions/advices/guidance are welcome.

@kellyzly kellyzly force-pushed the use_chimera branch 3 times, most recently from 4021dc6 to f8a0083 Compare April 9, 2015 08:48
@kellyzly kellyzly force-pushed the use_chimera branch 2 times, most recently from 9121bb2 to e7908aa Compare April 30, 2015 13:26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you cannot get the user credential, you can write a default number to configuration file every time when a certain user started the jobs, and retrieve this number in such case.
however, i think it maybe unsafe to store SPARK_SHUFFLE_TOKEN and related information in plain text in configuration file, it is better to cypher the texts then store the values.

@JoshRosen
Copy link
Contributor

@vanzin, could you help to clarify whether Spark currently includes encryption of shuffle data in transit? I believe that your SASL patch added this encryption? If so, I think that we should close this PR and the following JIRAs:

I don't see any mention of SASL on https://spark.apache.org/docs/1.5.0/security.html, so if we do indeed have encryption then we should update the docs to say so. Right now, the docs claim that "SSL is not supported yet for WebUI and block transfer service", making it sound like we have no encryption support for data in-transit.

@vanzin
Copy link
Contributor

vanzin commented Sep 17, 2015

Spark supports encrypted communication for shuffle data; we should fix the docs (I'll file a bug for that). But this PR is not about on-the-wire encryption, it's data at rest encryption (i.e. the shuffle files on disk).

@JoshRosen
Copy link
Contributor

Ah, gotcha. We should update this PR / JIRA title to better reflect this.

@JoshRosen
Copy link
Contributor

And to be clear, I do think we can close https://issues.apache.org/jira/browse/SPARK-6373, no?

@vanzin
Copy link
Contributor

vanzin commented Sep 17, 2015

re: SPARK-6373, it would just add another way of encrypting shuffle traffic; personally I don't see much value in it, but maybe others disagree.

@winningsix
Copy link

Hi @JoshRosen and @vanzin, do you have any further comments about this PR? The library Chimera is maintained by Intel. One way to address your comments is to move the library into Spark project. I'd like to move this jira forwards. So I am seeking for your suggestions and thoughts. Thank you!

@JoshRosen
Copy link
Contributor

@vanzin, given that you've been spending a lot of time on encryption-related things in Spark, I'll leave it to you to make a judgement call here.

@vanzin
Copy link
Contributor

vanzin commented Sep 21, 2015

@winningsix at the very least this PR needs to be updated to resolve conflicts.

I need to take a look at the library being pulled; my main concern is that being a new library, it may not be super stable, and having it outside the project would make it more difficult to quickly address bugs (because you have to go thorough another project's release cycle before you can fix Spark).

If it's not a big library, it might be worth it to pull it into Spark for the first release of this code, and later pull it out so that other Hadoop projects can also use it. Unless other projects are already using it, in which case my concerns are probably outdated.

@winningsix
Copy link

Thanks @vanzin for your prompt reply. In the current patch it consists of two parts. One is the encryption framework using JCE encryption and another is performance acceleration using Openssl library. Maybe we can get the first part in since it doesn't require the Chimera library. Then we could move the second part forwards either using Chimera or pull them into Spark. Any thoughts? Thank you!

@vanzin
Copy link
Contributor

vanzin commented Sep 22, 2015

@winningsix that would be awesome.

@winningsix
Copy link

@vanzin I create a new PR(#8880) addressing the common part and JCE key provider support. Could you help me review it? Ticket is also filed in SPARK-10771. Thank you!

@vanzin
Copy link
Contributor

vanzin commented Oct 24, 2015

@kellyzly given #8880, could you close this PR? I assume a new one will be built on to of that one to add native acceleration later.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@asfgit asfgit closed this in ce5fd40 Dec 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants