Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This PR removes spark.cleaner.ttl and the associated TTL-based metadata cleaning code.

Now that we have the ContextCleaner and a timer to trigger periodic GCs, I don't think that spark.cleaner.ttl is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs.

For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads.

@JoshRosen JoshRosen changed the title [SPARK-7689][WIP] Remove TTL-based metadata cleaning [SPARK-7689][WIP] Remove TTL-based metadata cleaning in Spark 2.0 Dec 31, 2015
@jerryshao
Copy link
Contributor

Good to remove it :). It has been obsolete for a long time, and will potentially lead to some unexpected behaviors especially in Streaming (like block not found).

@SparkQA
Copy link

SparkQA commented Dec 31, 2015

Test build #48528 has finished for PR 10534 at commit 98b732a.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 1, 2016

LGTM. We should let @tdas take a look at this though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO.

@SparkQA
Copy link

SparkQA commented Jan 2, 2016

Test build #48592 has finished for PR 10534 at commit e6482fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 2, 2016

Test build #48590 has finished for PR 10534 at commit 5ffe30f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48601 has finished for PR 10534 at commit 1d96791.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@gatorsmile
Copy link
Member

All the recent test builds failed due to the same issue.

@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48616 has finished for PR 10534 at commit 1d96791.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #48645 has finished for PR 10534 at commit 1d96791.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

+1

@JoshRosen JoshRosen changed the title [SPARK-7689][WIP] Remove TTL-based metadata cleaning in Spark 2.0 [SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0 Jan 4, 2016
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt it simpler to keep the Scala map interface? Will minimize changes in rest of code in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about the pitfalls of .getOrElseUpdate not being atomic on ConcurrentHashMaps that had been wrapped into Scala maps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #48693 has finished for PR 10534 at commit 0868fab.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen JoshRosen force-pushed the remove-ttl-based-cleaning branch from 0868fab to 8b7ca1c Compare January 6, 2016 22:30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this < removed??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks for catching that.

@tdas
Copy link
Contributor

tdas commented Jan 6, 2016

LGTM, pending tests.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48878 has finished for PR 10534 at commit 9704bc2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48889 has finished for PR 10534 at commit 2451963.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 7, 2016

Merging. Thanks.

@asfgit asfgit closed this in 8e19c76 Jan 7, 2016
@JoshRosen JoshRosen deleted the remove-ttl-based-cleaning branch August 29, 2016 19:21
@JoshRosen
Copy link
Contributor Author

Cross-references to related PRs (to aid other code archaeologists):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants