-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12269][Streaming][Kinesis] Update aws-java-sdk version #10256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Have you tested the current kinesis and works with the new API? Looking online it seems like we might also need to bump our kinesis client library if we upgrade the general aws sdk version. |
|
Note we also depend depend on com.fasterxml.jackson.core 2.4.4 in Spark |
|
I ran the demo with the following two commands against a stream (test-stream1) on my AWS account. It performed as expected. I happy to bump the KCL library as well. With the bumped com.fasterxml.jackson.core version, the project build and passed all tests except docker-integration, which I can't run. AWS Java SDK 1.10.40 calls a function that isn't provided by Jackson 2.4.x. If raising that to 2.5.x isn't an option, then we can find whatever the last version of AWS Java SDK to not use it is. I know it works with 1.10.24, so somewhere between 1.10.24 and 1.10.40 could be a temporary solution. |
|
Cool - if we don't need to upgrade the KCL library then thats probably good. Maybe good to see if @tdas or someone similar can take a look? |
|
/cc @brkyvz RE: the KCL library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it dangerous to upgrade jackson? It's also used by SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes me nervous. I'll run the tests to see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen did you manage to run the tests? We'd need to be sure nothing breaks with this version bump!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MLnick the tests seem to have run OK in https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2203/consoleFull (although jenkins seems to be having some issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holdenk Unfortunately the tests that actually mattered didn't run, they were ignored :(
[info] WithAggregationKinesisStreamSuite:
[info] - KinesisUtils API (202 milliseconds)
[info] - RDD generation (24 milliseconds)
[info] - basic operation [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - custom message handling [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - failure recovery [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Prepare KinesisTestUtils [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] WithoutAggregationKinesisStreamSuite:
[info] - KinesisUtils API (5 milliseconds)
[info] - RDD generation (6 milliseconds)
[info] - basic operation [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - custom message handling [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - failure recovery [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Prepare KinesisTestUtils [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] WithAggregationKinesisBackedBlockRDDSuite:
[info] - Basic reading from Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available in both block manager and Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available only in block manager, not in Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available only in Kinesis, not in block manager [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available partially in block manager, rest in Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Test isBlockValid skips block fetching from block manager [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Test whether RDD is valid after removing blocks from block anager [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] WithoutAggregationKinesisBackedBlockRDDSuite:
[info] - Basic reading from Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available in both block manager and Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available only in block manager, not in Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available only in Kinesis, not in block manager [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Read data available partially in block manager, rest in Kinesis [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Test isBlockValid skips block fetching from block manager [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
[info] - Test whether RDD is valid after removing blocks from block anager [enable by setting env var ENABLE_KINESIS_TESTS=1] !!! IGNORED !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests are i think enabled only in the default jenkins PR builder.
|
@BrianLondon What happens if you change the version of KCL to 1.6.1? Could you also please add a new line to anywhere in extras/kinesis-asl? Otherwise Kinesis tests don't run and we have no idea whether this is a safe change. Thanks! |
|
Test build #2203 has finished for PR 10256 at commit
|
|
@brkyvz Pushed up the KCL version and added a newline to KinesisReciever. Tests passed for me locally on a hadoop 2.6 build and the kinesis demos ran against a live AWS stream. |
|
@holdenk bump |
|
Does anyone working on SQL (e.g. @marmbrus ) have any concerns about the Jackson upgrade? |
|
@brkyvz @holdenk @tdas any particular reason we depend on |
|
I'm not really sure why - skimming the git logs it seems the aws-sdk-java sdk dependency was added at the same time as the kcl dependency. |
|
It appears kcl pulls in 1.10.20. I'm all for dropping the explicit dependency. That probably wouldn't have prevented the regression in Spark 1.5.2 though. At any rate, the jackson bump is necessary if someone wants to use the latest aws sdk (e.g. for Firehose). |
|
I removed the explicit dependence on the AWS Java SDK. There's a newline that was added to Tests and the kinesis example worked locally for me for what it's worth. |
|
FYI, a bunch of |
|
Yeah, that looks like the same underlying issue of using AWS Java SDK 1.10.0 or later with Spark. I believe this version bump will fix it. You can sometimes get around it by adding the following to your sbt build: dependencyOverrides ++= Set(
"com.fasterxml.jackson.core" % "jackson-core" % "2.4.4",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4",
"com.fasterxml.jackson.core" % "jackson-annotations" % "2.4.4"
)That doesn't work however if the calls to the AWS library use any Jackson 2.5.x functionality and some of them do. |
|
I've also run the Kinesis tests locally (against a live AWS account) against this PR and all tests pass. I'm +1 on this subject to agreement for the jackson dependency bump, for which I defer to SQL folks e.g. @marmbrus |
|
If the SQL/Hive tests are passing I'm good. @srowen usually has good insight on dependency changes. |
|
Can this wait until 2.0? I think it's going to anyway now. Then the dependency issue is fine. |
|
Test build #2271 has finished for PR 10256 at commit
|
|
Rebased against current master, ran tests (except for docker) locally, and tested kinesis example against aws hosted stream. All worked. My preference would be to get the version bump in as soon as practicable for two reasons. First, and mostly selfishly, the inability to link any library that requires Jackson 2.5.x features is causing us problems in our production environment. Second, we currently can just bump the version number because the Spark codebase appears compatible with both Jackson versions at present, but the longer we wait the more likely someone introduces an incompatible change. That said, I fully understand the reluctance to replace a heavily used library mid release cycle. |
|
Test build #2354 has finished for PR 10256 at commit
|
|
Ah right, you'll need to acknowledge the dep changes in the build file: This LGTM but I do think we'll have to keep it for 2.x only. |
|
Manifest updated. Is there documented somewhere what all the tests Travis runs are? It seems to differ from what's described at http://spark.apache.org/docs/latest/building-spark.html |
|
@BrianLondon it should be running |
|
Test build #2358 has finished for PR 10256 at commit
|
|
I'm going to go for this one since I think we may want to, in general, consider updating Spark dependencies in 2.x anyway. 2.5 is even old for Jackson, so good to bump up here as a side effect I think. |
|
Merged to master |
|
I know this issue is closed, but one suggestion I have would be to specify the latest Jackson 2.5 version, 2.5.5 instead of 2.5.3. While I understand caution in general, Jackson patch versions try to limit inclusion of fixes to safe ones and specifically there are no API changes. This also means that while it is good to keep even patch versions in sync across components there is very low risk of incompatibilities even if patch versions do differ. In this specific case, |
|
I think that's OK. I was intending to propose a series of dependency updates for 2.x like this sort of thing. |
|
@srowen Ah. Yes, just wanted to mention it; I don't have enough context so there may be specific reasons for picking up particular versions. Glad to see version upgrades coming along. My interests are selfish, being both the main author of Jackson, and a (relative new) Spark user at work. :) With that, if there are problems from Jackson side I could help, do not hesitate to cc me and/or file issues at https://github.com/FasterXML/jackson-databind (or other specific components). I know version compatibility is an annoying issue, and would like to help if possible. |
The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.