-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20189][DStream] Fix spark kinesis testcases to remove deprecated createStream and use Builders #17506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My general comment is that we don't want to leave even deprecated methods untested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now testing a Scala API in a Java test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I however couldn't pass the org.apache.spark.api.java.function.Function to the buildWithHandler. Tried lot of other ways but only this seemed to make the api happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add more testcases for the deprecated API function createStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old style testcases do not work with the aws session tokens. I could add a ENV variable based check if that seems like a better solution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a change you should make then. If you're switching to Scala in a Java test to avoid a deprecated API then the test isn't quite testing what it is meant to. It also sounds like it could potentially leave the existing deprecated method untested.
Are you saying there's not an un-deprecated way to call this in the Java API? That sounds like a closely related issue that should be addressed first if so.
84d5352 to
7f86f39
Compare
|
@srowen - Have made following improvements to the patch to incorporate your points:
Let me know your thoughts. Thanks. |
|
So, is the point just removing calls to a deprecated API? It doesn't seem like it needs to be this complicated. If a test is not specifically intended to test the deprecated API, then it should just change to use the newer undeprecated API. If that doesn't really exist, don't change that instance. Later an undeprecated alternative ought to be added, really, for callers to use as well as tests. |
|
@srowen - Yes, 2 objectives for the patch:
For now I have added new test cases for the new api, so that we have the coverage for session tokens. I could remove the flag option, and just keep the new test cases if that suits better. Thoughts ? Thanks |
|
Why would you disable old API tests here? the primary purpose is not calling deprecated APIs in tests where it's not needed. Best thing is to just do that. If you happen to add a new test or two along the way to shore up coverage that's good too. I wouldn't make it more complex than that here. |
- add new kinesis api testcases - add flag to disable old kinesis api testcases
|
Test build #3635 has finished for PR 17506 at commit
|
7f86f39 to
b477090
Compare
|
@srowen - Thanks for the feedback. Appreciate it. Added a minimal simplified patch which fixes the testcases that fail with the old api. |
|
The Scala style check fail because of the double spaced lines probably. But that's how the existing code was so thought of keeping it that way. |
|
@yssharma you can't check in code that fails style checks. The existing code passes the checks. |
|
@srowen : It failed on the earlier patch with KinesisTestUtils.scala changes. This version is clean. Will wait for the next automated build :) Build logs: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3635/console |
|
@srowen - does the Jenkins re-test trigger automatically? |
|
Jenkins add to whitelist |
|
It does retrigger automatically but you need to push commits. |
|
Test build #75566 has finished for PR 17506 at commit
|
|
Thanks @srowen |
|
Is there anything else that can be done on this patch. The patch fixes all the deprecated api testcases that try to use the aws secret/id credentials instead of the builder. |
|
@srowen do you feel this patch could be merged now ? |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough to really evaluate
| localTestUtils.endpointUrl, localTestUtils.regionName, InitialPositionInStream.LATEST, | ||
| Seconds(10), StorageLevel.MEMORY_ONLY, | ||
| awsCredentials.getAWSAccessKeyId, awsCredentials.getAWSSecretKey) | ||
| val stream = KinesisInputDStream.builder.streamingContext(ssc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, it's no longer necessary to pass in the credentials explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thats true. The KinesisInputDStream.builder uses the Dafault credentials if no creds are passed, and the default creds work with both AWS Key and Session tokens.
So now "mvn test" works on both permanent aws keys & session based token environments.
| checkpointInterval.getOrElse(ssc.graph.batchDuration), | ||
| storageLevel.getOrElse(DEFAULT_STORAGE_LEVEL), | ||
| handler, | ||
| ssc.sc.clean(handler), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is required.
KinesisUtils used to send a cleaned Handler while creating stream, so we do this in the KinesisInputDstream now.
val cleanedHandler = ssc.sc.clean(messageHandler)
|
Merged to master |
|
Thanks @srowen |
…ed createStream and use Builders ## What changes were proposed in this pull request? The spark-kinesis testcases use the KinesisUtils.createStream which are deprecated now. Modify the testcases to use the recommended KinesisInputDStream.builder instead. This change will also enable the testcases to automatically use the session tokens automatically. ## How was this patch tested? All the existing testcases work fine as expected with the changes. https://issues.apache.org/jira/browse/SPARK-20189 Author: Yash Sharma <[email protected]> Closes apache#17506 from yssharma/ysharma/cleanup_kinesis_testcases.
What changes were proposed in this pull request?
The spark-kinesis testcases use the KinesisUtils.createStream which are deprecated now. Modify the testcases to use the recommended KinesisInputDStream.builder instead.
This change will also enable the testcases to automatically use the session tokens automatically.
How was this patch tested?
All the existing testcases work fine as expected with the changes.
https://issues.apache.org/jira/browse/SPARK-20189