Skip to content

Conversation

@steveloughran
Copy link
Contributor

This is the successor to PR #5423 ; it incorporates SPARK-11315 (PR #8744), which was split out for easier review.

It adds a history provider which uses the YARN timeline server for histories, reading the events published in the application by way of the #8744 publisher. It's very efficient at getting attempt summary data, as that is server

It also contains preparatory support for history server metrics (SPARK-11373 / #9571) (i.e. it collect metrics, but does not publish them), and the cache updating of incomplete work of SPARK-7889 /#6935, (the #8744 publisher includes an incrementing counter, which is used in the history server to determine updates to histories.)

In comparison to the FS history provider, bootstrap time is fast as there is no need to replay histories to extract that metadata. It does place load on the timeline server, hence various options to configure the frequency of probing for updates, including disabling background refreshes until users actually reload pages. Because the YARN ATS service has different failure modes from HDFS, there's some more startup checking of service availability, with failure information collected and reported —as well as noted in metrics. (More succinctly, the FS history provider assumes HDFS doesn't fail).

The new history server provider is added in yarn/src/history, along with its various tests. The code is only included in compiles, tests and scalastyle checks on Hadoop 2.6+, so does not cause any compatibility issues when Spark is built against previous Hadoop versions.

@SparkQA
Copy link

SparkQA commented Jan 1, 2016

Test build #48568 has finished for PR 10545 at commit 21e0c76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

I should add that i'm thinking of moving the core rest package from yarn/src/history to yarn/src/main. Why? It adds Hadoop authentication to Jersey Client, so allows general REST access to any of the Hadoop services. The ATS client simply uses this to talk to ATS —it can just as easily be used to talk to anything (including spark apps) through the YARN RM proxy even when kerberos/SPNEGO auth is enabled in YARN. This makes it more generic and useful elsewhere.

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-1537-ATS branch from 21e0c76 to 8d781db Compare January 5, 2016 12:32
@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48767 has finished for PR 10545 at commit 8d781db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-1537-ATS branch from 8d781db to 690b686 Compare January 12, 2016 16:23
@SparkQA
Copy link

SparkQA commented Jan 12, 2016

Test build #49241 has finished for PR 10545 at commit 690b686.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-1537-ATS branch from 690b686 to 83560db Compare January 27, 2016 18:09
@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50201 has finished for PR 10545 at commit 83560db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…ter (the one with the service API merged in)
…request pushed. This is for more reliable polling for changes during integration with YARN-7889
…d track attempt versions. This is for more reliable polling for changes during integration with YARN-7889
…f improvement in test running in the process. Tests can register "failureActions" for execution on a test failure; closures to dump the state of things & so have better diagnostics
…s this. In production even 10s is probably too short, so it doesn't make things much worse
…ATS URL (as info level wasn't giving any details on whether/when entities were published, or under what); downgrade event drop to info & not warning
@steveloughran steveloughran force-pushed the stevel/feature/SPARK-1537-ATS branch from 83560db to b6f3b99 Compare March 2, 2016 17:08
@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52326 has finished for PR 10545 at commit b6f3b99.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants