[Spark-5682] Add spark encrypted shuffle by using chimera lib #5307

kellyzly · 2015-04-01T07:38:04Z

Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects. It provides JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. We can directly use Chimera to implement spark encrypted shuffle.

vanzin · 2015-04-07T23:11:46Z

core/pom.xml

This is not generally needed.

vanzin · 2015-04-07T23:33:26Z

Left some initial comments. I like the idea of using a library for this, my main concern is about the stability of that library (and who'll be maintaining it going forward).

Aside from that, there are some parts of the code that need cleaning up and some style updates, and some broken error handling, but nothing too major.

kellyzly · 2015-04-09T08:17:28Z

@vanzin :

Left some initial comments. I like the idea of using a library for this, my main concern is about the stability of that library (and who'll be maintaining it going forward).

i have updated the code according to your valuable suggestions. Chimera is a project which is maintained by Intel team(My teammate dianfu is the contributor of it). Chimera is a project which strips code related to the encryption and decryption part of Transparent Encryption in HDFS from hadoop. Transparent Encryption is imported since hadoop 2.6 and the code is stable after hadoop 2.6 release. Although the current version of Chimera is 0.0.1, it is same stable as current hadoop because we just strip code from current trunk of hadoop. If someone reports new bugs to hadoop about the encryption and decryption part later, my teammate will watch it and patch the patches of those bugs to Chimera.

More information about encrypted shuffle you can reference https://issues.apache.org/jira/browse/SPARK-5682. Any suggestions/advices/guidance are welcome.

hujy · 2015-07-02T01:35:41Z

core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala

if you cannot get the user credential, you can write a default number to configuration file every time when a certain user started the jobs, and retrieve this number in such case.
however, i think it maybe unsafe to store SPARK_SHUFFLE_TOKEN and related information in plain text in configuration file, it is better to cypher the texts then store the values.

JoshRosen · 2015-09-17T22:29:30Z

@vanzin, could you help to clarify whether Spark currently includes encryption of shuffle data in transit? I believe that your SASL patch added this encryption? If so, I think that we should close this PR and the following JIRAs:

I don't see any mention of SASL on https://spark.apache.org/docs/1.5.0/security.html, so if we do indeed have encryption then we should update the docs to say so. Right now, the docs claim that "SSL is not supported yet for WebUI and block transfer service", making it sound like we have no encryption support for data in-transit.

vanzin · 2015-09-17T22:32:42Z

Spark supports encrypted communication for shuffle data; we should fix the docs (I'll file a bug for that). But this PR is not about on-the-wire encryption, it's data at rest encryption (i.e. the shuffle files on disk).

JoshRosen · 2015-09-17T22:51:24Z

Ah, gotcha. We should update this PR / JIRA title to better reflect this.

JoshRosen · 2015-09-17T22:52:46Z

And to be clear, I do think we can close https://issues.apache.org/jira/browse/SPARK-6373, no?

vanzin · 2015-09-17T23:00:39Z

re: SPARK-6373, it would just add another way of encrypting shuffle traffic; personally I don't see much value in it, but maybe others disagree.

winningsix · 2015-09-21T06:45:06Z

Hi @JoshRosen and @vanzin, do you have any further comments about this PR? The library Chimera is maintained by Intel. One way to address your comments is to move the library into Spark project. I'd like to move this jira forwards. So I am seeking for your suggestions and thoughts. Thank you!

JoshRosen · 2015-09-21T18:22:12Z

@vanzin, given that you've been spending a lot of time on encryption-related things in Spark, I'll leave it to you to make a judgement call here.

vanzin · 2015-09-21T23:35:37Z

@winningsix at the very least this PR needs to be updated to resolve conflicts.

I need to take a look at the library being pulled; my main concern is that being a new library, it may not be super stable, and having it outside the project would make it more difficult to quickly address bugs (because you have to go thorough another project's release cycle before you can fix Spark).

If it's not a big library, it might be worth it to pull it into Spark for the first release of this code, and later pull it out so that other Hadoop projects can also use it. Unless other projects are already using it, in which case my concerns are probably outdated.

winningsix · 2015-09-22T01:04:08Z

Thanks @vanzin for your prompt reply. In the current patch it consists of two parts. One is the encryption framework using JCE encryption and another is performance acceleration using Openssl library. Maybe we can get the first part in since it doesn't require the Chimera library. Then we could move the second part forwards either using Chimera or pull them into Spark. Any thoughts? Thank you!

vanzin · 2015-09-22T01:31:55Z

@winningsix that would be awesome.

winningsix · 2015-09-23T07:10:55Z

@vanzin I create a new PR(#8880) addressing the common part and JCE key provider support. Could you help me review it? Ticket is also filed in SPARK-10771. Thank you!

vanzin · 2015-10-24T00:15:25Z

@kellyzly given #8880, could you close this PR? I assume a new one will be built on to of that one to add native acceleration later.

AmplabJenkins · 2015-11-13T19:32:11Z

Can one of the admins verify this patch?

kellyzly force-pushed the use_chimera branch from 1b9baa8 to b401e0b Compare April 1, 2015 08:02

vanzin reviewed Apr 7, 2015
View reviewed changes

core/pom.xml Outdated

Copy link

Contributor

vanzin Apr 7, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not generally needed.

kellyzly force-pushed the use_chimera branch from b401e0b to 149bf77 Compare April 9, 2015 07:47

kellyzly force-pushed the use_chimera branch 3 times, most recently from 4021dc6 to f8a0083 Compare April 9, 2015 08:48

kellyzly force-pushed the use_chimera branch 2 times, most recently from 9121bb2 to e7908aa Compare April 30, 2015 13:26

[Spark-5682] Add spark encrypted shuffle by using chimera lib

3f8927f

kellyzly force-pushed the use_chimera branch from e7908aa to 3f8927f Compare May 6, 2015 05:24

hujy reviewed Jul 2, 2015
View reviewed changes

winningsix mentioned this pull request Oct 9, 2015

[SPARK-5682][Core] Add encrypted shuffle in spark #8880

Closed

asfgit closed this in ce5fd40 Dec 17, 2015

[Spark-5682] Add spark encrypted shuffle by using chimera lib #5307

[Spark-5682] Add spark encrypted shuffle by using chimera lib #5307

Uh oh!

Conversation

kellyzly commented Apr 1, 2015

Uh oh!

vanzin Apr 7, 2015

Choose a reason for hiding this comment

Uh oh!

vanzin commented Apr 7, 2015

Uh oh!

kellyzly commented Apr 9, 2015

Uh oh!

hujy Jul 2, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Sep 17, 2015

Uh oh!

vanzin commented Sep 17, 2015

Uh oh!

JoshRosen commented Sep 17, 2015

Uh oh!

JoshRosen commented Sep 17, 2015

Uh oh!

vanzin commented Sep 17, 2015

Uh oh!

winningsix commented Sep 21, 2015

Uh oh!

JoshRosen commented Sep 21, 2015

Uh oh!

vanzin commented Sep 21, 2015

Uh oh!

winningsix commented Sep 22, 2015

Uh oh!

vanzin commented Sep 22, 2015

Uh oh!

winningsix commented Sep 23, 2015

Uh oh!

vanzin commented Oct 24, 2015

Uh oh!

AmplabJenkins commented Nov 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants