Skip to content

Conversation

@yssharma
Copy link

@yssharma yssharma commented Apr 1, 2017

What changes were proposed in this pull request?

The spark-kinesis testcases use the KinesisUtils.createStream which are deprecated now. Modify the testcases to use the recommended KinesisInputDStream.builder instead.
This change will also enable the testcases to automatically use the session tokens automatically.

How was this patch tested?

All the existing testcases work fine as expected with the changes.

https://issues.apache.org/jira/browse/SPARK-20189

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general comment is that we don't want to leave even deprecated methods untested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now testing a Scala API in a Java test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I however couldn't pass the org.apache.spark.api.java.function.Function to the buildWithHandler. Tried lot of other ways but only this seemed to make the api happy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add more testcases for the deprecated API function createStream

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old style testcases do not work with the aws session tokens. I could add a ENV variable based check if that seems like a better solution ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a change you should make then. If you're switching to Scala in a Java test to avoid a deprecated API then the test isn't quite testing what it is meant to. It also sounds like it could potentially leave the existing deprecated method untested.

Are you saying there's not an un-deprecated way to call this in the Java API? That sounds like a closely related issue that should be addressed first if so.

@yssharma yssharma force-pushed the ysharma/cleanup_kinesis_testcases branch 2 times, most recently from 84d5352 to 7f86f39 Compare April 3, 2017 09:47
@yssharma
Copy link
Author

yssharma commented Apr 3, 2017

@srowen - Have made following improvements to the patch to incorporate your points:

  • Did not remove old testcases for keeping the checks on old api
  • Added new testcases that test the new kinesis api
  • Added an option to tag the old api testcases with flag (which can also be useful for identifying them in future easily)
  • Added option to set env variable to disable the old api kinesis tests - this will help devs to run the complete testsuites on aws-session-token based infrastructure without any changes

Let me know your thoughts. Thanks.

@srowen
Copy link
Member

srowen commented Apr 3, 2017

So, is the point just removing calls to a deprecated API? It doesn't seem like it needs to be this complicated.

If a test is not specifically intended to test the deprecated API, then it should just change to use the newer undeprecated API. If that doesn't really exist, don't change that instance. Later an undeprecated alternative ought to be added, really, for callers to use as well as tests.

@yssharma
Copy link
Author

yssharma commented Apr 3, 2017

@srowen - Yes, 2 objectives for the patch:

  • add new testcases for new api (that also tests aws session tokens)
  • ability to disable old api test cases

For now I have added new test cases for the new api, so that we have the coverage for session tokens.
I could remove the part of disabling old test cases, but then while running the whole test suites there would be few failures for environments with session tokens and assumed roles, and devs might think why do we have broken test cases.

I could remove the flag option, and just keep the new test cases if that suits better. Thoughts ?

Thanks

@srowen
Copy link
Member

srowen commented Apr 3, 2017

Why would you disable old API tests here? the primary purpose is not calling deprecated APIs in tests where it's not needed. Best thing is to just do that. If you happen to add a new test or two along the way to shore up coverage that's good too. I wouldn't make it more complex than that here.

- add new kinesis api testcases
- add flag to disable old kinesis api testcases
@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #3635 has finished for PR 17506 at commit 7f86f39.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yssharma yssharma force-pushed the ysharma/cleanup_kinesis_testcases branch from 7f86f39 to b477090 Compare April 3, 2017 12:24
@yssharma
Copy link
Author

yssharma commented Apr 3, 2017

@srowen - Thanks for the feedback. Appreciate it. Added a minimal simplified patch which fixes the testcases that fail with the old api.

@yssharma
Copy link
Author

yssharma commented Apr 4, 2017

The Scala style check fail because of the double spaced lines probably. But that's how the existing code was so thought of keeping it that way.

@srowen
Copy link
Member

srowen commented Apr 4, 2017

@yssharma you can't check in code that fails style checks. The existing code passes the checks.

@yssharma
Copy link
Author

yssharma commented Apr 4, 2017

@srowen : It failed on the earlier patch with KinesisTestUtils.scala changes. This version is clean. Will wait for the next automated build :)

Build logs: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3635/console
Commit : 7f86f39

========================================================================
Running Scala style checks
========================================================================
Scalastyle checks failed at following occurrences:
[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:220: File line length exceeds 100 characters
[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:242: File line length exceeds 100 characters
[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:246: File line length exceeds 100 characters
[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:256: File line length exceeds 100 characters
[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:219:0: Use Javadoc style indentation for multiline comments
[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:227:0: Use Javadoc style indentation for multiline comments
[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:234:0: Use Javadoc style indentation for multiline comments
[error] (streaming-kinesis-asl/compile:scalastyle) errors exist
[error] Total time: 20 s, completed Apr 3, 2017 5:14:05 AM

@yssharma
Copy link
Author

yssharma commented Apr 6, 2017

@srowen - does the Jenkins re-test trigger automatically?
else, could I request a retest on this patch please ?

@srowen
Copy link
Member

srowen commented Apr 6, 2017

Jenkins add to whitelist

@srowen
Copy link
Member

srowen commented Apr 6, 2017

It does retrigger automatically but you need to push commits.
Hm, it may be that somehow the kinesis code isn't passing style checks and wasn't caught? because it's not generally tested unless it's changed. In any event yes definitely fix up any errors old or new.

@SparkQA
Copy link

SparkQA commented Apr 6, 2017

Test build #75566 has finished for PR 17506 at commit b477090.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yssharma
Copy link
Author

yssharma commented Apr 6, 2017

Thanks @srowen

@yssharma
Copy link
Author

yssharma commented Apr 8, 2017

Is there anything else that can be done on this patch. The patch fixes all the deprecated api testcases that try to use the aws secret/id credentials instead of the builder.

@yssharma
Copy link
Author

@srowen do you feel this patch could be merged now ?
Thanks

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough to really evaluate

localTestUtils.endpointUrl, localTestUtils.regionName, InitialPositionInStream.LATEST,
Seconds(10), StorageLevel.MEMORY_ONLY,
awsCredentials.getAWSAccessKeyId, awsCredentials.getAWSSecretKey)
val stream = KinesisInputDStream.builder.streamingContext(ssc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, it's no longer necessary to pass in the credentials explicitly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats true. The KinesisInputDStream.builder uses the Dafault credentials if no creds are passed, and the default creds work with both AWS Key and Session tokens.
So now "mvn test" works on both permanent aws keys & session based token environments.

checkpointInterval.getOrElse(ssc.graph.batchDuration),
storageLevel.getOrElse(DEFAULT_STORAGE_LEVEL),
handler,
ssc.sc.clean(handler),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is required.
KinesisUtils used to send a cleaned Handler while creating stream, so we do this in the KinesisInputDstream now.

val cleanedHandler = ssc.sc.clean(messageHandler)

@srowen
Copy link
Member

srowen commented Apr 13, 2017

Merged to master

@asfgit asfgit closed this in ec68d8f Apr 13, 2017
@yssharma
Copy link
Author

Thanks @srowen

peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…ed createStream and use Builders

## What changes were proposed in this pull request?

The spark-kinesis testcases use the KinesisUtils.createStream which are deprecated now. Modify the testcases to use the recommended KinesisInputDStream.builder instead.
This change will also enable the testcases to automatically use the session tokens automatically.

## How was this patch tested?

All the existing testcases work fine as expected with the changes.

https://issues.apache.org/jira/browse/SPARK-20189

Author: Yash Sharma <[email protected]>

Closes apache#17506 from yssharma/ysharma/cleanup_kinesis_testcases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants