Proposal: clarify Scala programming guide on caching ... #668

esjewett · 2014-05-06T17:01:22Z

... with regards to saved map output. Wording taken partially from Matei Zaharia's email to the Spark user list. http://apache-spark-user-list.1001560.n3.nabble.com/performance-improvement-on-second-operation-without-caching-td5227.html

AmplabJenkins · 2014-05-06T17:02:58Z

Can one of the admins verify this patch?

pwendell · 2014-05-06T17:59:50Z

docs/scala-programming-guide.md

Not your change, but I think this should say "will be persisted in memory or on disk on the nodes"

Oh sorry - nevermind, this is explained below and this case only refers to calling persist() without arguments.

esjewett · 2014-05-06T18:10:37Z

Just putting it out there: I'm not attached to any of this wording, so change away, or don't accept it. No problem either way. I just thought my question on the user list as to whether the programming guide could be updated was better stated as a pull request ;-)

pwendell · 2014-05-06T18:26:34Z

docs/scala-programming-guide.md

It's a great idea to have this here. This is a totally non-obvious fact and I think many users would like to know this.

My only thought is, would you mind moving this to the end of the "RDD Persistence" section. Also, at this point in the guide I don't think the concept of stages or jobs has been introduced. So it might be good to have something like:

Spark sometimes automatically persists intermediate state from RDD operations, even without users calling persist() or cache(). In particular, if a shuffle happens when computing an RDD, Spark will keep the outputs from the map side of the shuffle on disk to avoid re-computing the entire dependency graph if an RDD is re-used. We still recommend users call persist() if they plan to re-use an RDD iteratively.

Text taken primarily from Patrick Wendell's comment on the pull request. Also changed wording in "RDD Operations" section so as not to imply a guarantee that RDDs are reprocessed if persist() is not run.

esjewett · 2014-05-06T19:11:44Z

@pwendell I like your wording. Switched to use it, and moved it to the end of the "RDD Persistence" section as requested. I also updated the "RDD Operations" section with a small change so as not to imply that RDDs that aren't persist()ed will always be reprocessed.

pwendell · 2014-05-07T03:49:32Z

Okay I can merge this, thanks!

... with regards to saved map output. Wording taken partially from Matei Zaharia's email to the Spark user list. http://apache-spark-user-list.1001560.n3.nabble.com/performance-improvement-on-second-operation-without-caching-td5227.html Author: Ethan Jewett <[email protected]> Closes #668 from esjewett/Doc-update and squashes the following commits: 11793ce [Ethan Jewett] Update based on feedback 171e670 [Ethan Jewett] Clarify Scala programming guide on caching ... (cherry picked from commit 48ba3b8) Signed-off-by: Patrick Wendell <[email protected]>

... with regards to saved map output. Wording taken partially from Matei Zaharia's email to the Spark user list. http://apache-spark-user-list.1001560.n3.nabble.com/performance-improvement-on-second-operation-without-caching-td5227.html Author: Ethan Jewett <[email protected]> Closes apache#668 from esjewett/Doc-update and squashes the following commits: 11793ce [Ethan Jewett] Update based on feedback 171e670 [Ethan Jewett] Clarify Scala programming guide on caching ...

…r log4j-1.2.17.jar (apache#664)" (apache#668) This reverts commit 8948477.

Clarify Scala programming guide on caching ...

171e670

... with regards to saved map output. Wording taken partially from Matei Zaharia's email to the Spark user list. http://apache-spark-user-list.1001560.n3.nabble.com/performance-improvement-on-second-operation-without-caching-td5227.html

pwendell reviewed May 6, 2014
View reviewed changes

Update based on feedback

11793ce

Text taken primarily from Patrick Wendell's comment on the pull request. Also changed wording in "RDD Operations" section so as not to imply a guarantee that RDDs are reprocessed if persist() is not run.

asfgit closed this in 48ba3b8 May 7, 2014

agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022

Revert "MapR [SPARK-701] CVE-2019-17571 security vulnerability for fo…

0c0033b

…r log4j-1.2.17.jar (apache#664)" (apache#668) This reverts commit 8948477.

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

Revert "MapR [SPARK-701] CVE-2019-17571 security vulnerability for fo…

84fcd71

…r log4j-1.2.17.jar (apache#664)" (apache#668) This reverts commit 8948477.

mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025

Revert "MapR [SPARK-701] CVE-2019-17571 security vulnerability for fo…

5755374

…r log4j-1.2.17.jar (apache#664)" (apache#668) This reverts commit 8948477.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: clarify Scala programming guide on caching ... #668

Proposal: clarify Scala programming guide on caching ... #668

Uh oh!

esjewett commented May 6, 2014

Uh oh!

AmplabJenkins commented May 6, 2014

Uh oh!

pwendell May 6, 2014

Uh oh!

pwendell May 6, 2014

Uh oh!

esjewett commented May 6, 2014

Uh oh!

pwendell May 6, 2014

Uh oh!

esjewett commented May 6, 2014

Uh oh!

pwendell commented May 7, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Proposal: clarify Scala programming guide on caching ... #668

Proposal: clarify Scala programming guide on caching ... #668

Uh oh!

Conversation

esjewett commented May 6, 2014

Uh oh!

AmplabJenkins commented May 6, 2014

Uh oh!

pwendell May 6, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell May 6, 2014

Choose a reason for hiding this comment

Uh oh!

esjewett commented May 6, 2014

Uh oh!

pwendell May 6, 2014

Choose a reason for hiding this comment

Uh oh!

esjewett commented May 6, 2014

Uh oh!

pwendell commented May 7, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants