Skip to content

Conversation

@steveloughran
Copy link
Contributor

This pull request incorporates the work of SPARK-11314 and SPARK-11315, adding in the history server side of the system: a subclass of ApplicationHistoryProvider which can enumerate application histories listed in the YARN timeline server, and retrieve them on demand.

It includes

  1. The history provider itself
  2. A REST client for the timeline service, one which supports kerberos authentication against the remote timeline service. (Keytab based; not delegation token: as with the FsHistoryProvider, the History Server needs to be started with a keytab).
  3. A set of integration tests which do end-to-end testing of the history, from application-side event publishing to web UI retrieval and validation.
  4. Tests to verify the robustness of the history provider against failures. The history enumeration is asynchronous, so transient failures will result in out of date listings, rather than stack traces. Attempts to explicitly retrieve an application will fail when there are connectivity problems of any kind.

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29863 has finished for PR 5423 at commit 1543f47.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TimestampEvent(sparkEvent: SparkListenerEvent, time: Long)
    • class YarnEventListener(sc: SparkContext, service: YarnHistoryService)
    • class YarnHistoryProvider(sparkConf: SparkConf)
    • class YarnHistoryService extends AbstractService("History Service")
    • class WeakShutdownHook(service: YarnService) extends Runnable with Logging
  • This patch does not change any dependencies.

@steveloughran steveloughran changed the title SPARK-1537 Application Timeline Server integration SPARK-1537 [WiP] Application Timeline Server integration Apr 8, 2015
@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30029 has finished for PR 5423 at commit ba2e0a9.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class TimestampEvent(sparkEvent: SparkListenerEvent, time: Long, flush: Boolean = false)
    • class YarnEventListener(sc: SparkContext, service: YarnHistoryService)
    • class YarnHistoryProvider(sparkConf: SparkConf)
    • class YarnHistoryService extends AbstractService("History Service")
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need this, since it only call the same parent method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well-caught, probably an override that I ended up not expanding

@steveloughran
Copy link
Contributor Author

This is WiP build, with a lot more tests, with integration ones going all the way from a wired up spark context to an in-memory ATS server; this needs to be wrapped up with the GET calls to retrieve the data and verify full round trip of all event structures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logDebug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30378 timed out for PR 5423 at commit 8446042 after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30607 has finished for PR 5423 at commit 1256532.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TimestampEvent(sparkEvent: SparkListenerEvent, time: Long, flush: Boolean = false)
    • class YarnHistoryProvider(sparkConf: SparkConf)
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30605 has finished for PR 5423 at commit 0f0f66c.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class TimestampEvent(sparkEvent: SparkListenerEvent, time: Long, flush: Boolean = false)
    • class YarnHistoryProvider(sparkConf: SparkConf)
    • trait YarnService extends Closeable
  • This patch adds the following new dependencies:
    • commons-math3-3.1.1.jar
    • snappy-java-1.1.1.6.jar
  • This patch removes the following dependencies:
    • commons-math3-3.4.1.jar
    • snappy-java-1.1.1.7.jar

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30669 has finished for PR 5423 at commit 0d29785.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

@steveloughran
Copy link
Contributor Author

Note that this adds a new profile hadoop-2.6, to pull in the 2.6 JARs and conditionally add yarn/history source & tests to the build...without that the tests (probably) aren't running. I say probably as whatever is looking at public interfaces/classes does appear to be looking into history/src/main/scala

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30695 has finished for PR 5423 at commit 4c0dd85.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

@steveloughran
Copy link
Contributor Author

This iteration has a simpler service flush/shutdown logic, with specific messages for each action queued, and no attempt to trigger the yarn service stop when a stopApplication event is received.

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30770 has finished for PR 5423 at commit f8509b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30780 has finished for PR 5423 at commit c8e73e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30781 timed out for PR 5423 at commit 957daf5 after a configured wait of 150m.

@steveloughran
Copy link
Contributor Author

There's no obvious reason why the Jenkins build failed; the console says all the tests passed.

@srowen
Copy link
Member

srowen commented Apr 23, 2015

Yes, it says it timed out (two comments up)

@srowen
Copy link
Member

srowen commented Apr 23, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30862 has finished for PR 5423 at commit 957daf5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31164 has finished for PR 5423 at commit 0f6860b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait YarnService extends Closeable
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented May 1, 2015

Test build #31575 has finished for PR 5423 at commit 6aea6c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PrivilegedFunction[T](function: (() => T)) extends PrivilegedExceptionAction[T]
    • trait YarnExtensionService extends Closeable

@steveloughran
Copy link
Contributor Author

I'll mark the PrivilegedFunction as private; all it does is take a function () => T and run it as a PrivilegedExceptionAction, so making UGI.doAs slightly easier to integrate with

class PrivilegedFunction[T](function: (() => T)) extends PrivilegedExceptionAction[T] {
  override def run(): T = {
    function()
  }
}

@SparkQA
Copy link

SparkQA commented May 1, 2015

Test build #31593 has finished for PR 5423 at commit a166445.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PrivilegedFunction[T](function: (() => T)) extends PrivilegedExceptionAction[T]
    • trait YarnExtensionService extends Closeable

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #32022 has finished for PR 5423 at commit 854dc89.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • activation-1.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • pyrolite-2.0.1.jar
  • This patch removes the following dependencies:
    • jaxb-api-2.2.7.jar
    • jaxb-core-2.2.7.jar
    • jaxb-impl-2.2.7.jar
    • mesos-0.21.1-shaded-protobuf.jar
    • pmml-agent-1.1.15.jar
    • pmml-model-1.1.15.jar
    • pmml-schema-1.1.15.jar
    • pyrolite-4.4.jar
    • spark-unsafe_2.10-1.4.0-SNAPSHOT.jar

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32223 has finished for PR 5423 at commit 9801cf2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait YarnExtensionService extends Closeable

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32232 has started for PR 5423 at commit 477d30e.

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47267 has finished for PR 5423 at commit 6dac1bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47270 has finished for PR 5423 at commit 0adb70a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 8, 2015

Test build #47346 has finished for PR 5423 at commit b0e25bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47495 has started for PR 5423 at commit 52b9a2e.

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47550 has finished for PR 5423 at commit 20514a7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2015

Test build #47596 has finished for PR 5423 at commit e134e29.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…ter (the one with the service API merged in)
…request pushed. This is for more reliable polling for changes during integration with YARN-7889
…d track attempt versions. This is for more reliable polling for changes during integration with YARN-7889
…f improvement in test running in the process. Tests can register "failureActions" for execution on a test failure; closures to dump the state of things & so have better diagnostics
…s this. In production even 10s is probably too short, so it doesn't make things much worse
…ATS URL (as info level wasn't giving any details on whether/when entities were published, or under what); downgrade event drop to info & not warning
@steveloughran steveloughran force-pushed the stevel/feature/SPARK-1537-ATS branch from e134e29 to 2a5a739 Compare December 23, 2015 18:54
@rxin
Copy link
Contributor

rxin commented Dec 31, 2015

I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!

@asfgit asfgit closed this in 93b52ab Dec 31, 2015
@steveloughran
Copy link
Contributor Author

yes, it is still relevant, yes it was awaiting review, no I wasn't expecting it to be closed

@srowen
Copy link
Member

srowen commented Jan 1, 2016

@steveloughran I was under the impression this was not meant to be merged, as it would require YARN 2.6 (2.7?) and that's not yet assume-able in Spark. At this point Spark 2.x is on 2.2+, but here's an argument maybe for bumping that up. But I do agree that long-lived PRs probably aren't ideal here

@steveloughran
Copy link
Contributor Author

I'm about to resubmit it. The way the code is structured, the 2.6 specific stuff lives under yarn/src/history, as discussed in earlier points in this PR. Everything happily builds and tests on Hadoop <2.6, this feature and its tests only only get built on 2.6+

@steveloughran
Copy link
Contributor Author

now succeeded by #10545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants