-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1981] Add AWS Kinesis streaming support #1434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
Resolved conflict: project/SparkBuild.scala
project/SparkBuild.scala
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a copy-paste error? Did you mean to use "spark-kinesis-asl" here instead of "spark-ganglia-lgpl"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, stephen. thanks!
|
Jenkins, this is ok to test |
|
Chris, is this new code from scratch, or is it based on Parviz's old pull request? |
|
QA tests have started for PR 1434. This patch merges cleanly. |
|
@mateiz - this is a completely brand-new, from-scratch implementation. parviz's old code was actually a Scala port of the Java-based Kinesis sample app found here: https://github.com/aws/aws-sdk-java/blob/master/src/samples/AmazonKinesisApplication/SampleRecordProcessor.java this was fine for a quick/dirty sample of kinesis functionality, but my goal was to make this code more reusable, testable, readable, configurable, production-ready, and well-documented. the old code did not support the new Streaming 1.0 API and extras/ build structure (due to ASL-license). i also updated the AWS Java SDK and Kinesis Client Libraries to their latest versions. here is parviz's PR for reference/comparison: #223. thanks and i look forward to getting this merged! -chris |
|
also, can someone address the questions i have here regarding the ec2 scripts and other peripheral aspects of this PR: https://issues.apache.org/jira/browse/SPARK-1981?focusedCommentId=14072761 thanks! -chris |
|
this PR worked for @srosenthal , btw. |
bin/run-kinesis-example
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a need for a complete script to run the kinesis example? Shouldnt the standard bin/run-example be sufficient?
The thing is that we want to have one streamlined way for running all the examples. The more special cases we make, the more we have to explicitly maintain in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdeyhim explained me that this script is necessary as the jar for kinesis is not included in the spark jar. That makes sense.
But its best not to put such module-specific scripts in the main spark/bin directory. Its best to put this in extras/kinesis/bin/ .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, yup. this is a special circumstance due to the licensing. i didn't want to complicate the examples build by depending on an optional package, so this was the best workaround.
i put the scripts alongside the other scripts to reduce confusion, but i can move it to extras/kinesis/bin for sure. i'll update the docs accordingly.
|
@pdeyhim can you take a look over this too when you have a chance? |
|
Jenkins, this is ok to test. |
|
Jenkins, test this please. |
|
QA tests have started for PR 1434. This patch DID NOT merge cleanly! |
|
QA results for PR 1434: |
|
Hey @cfregly, My apologies for being so late to review this. Seems like the PR requires merging with the master, probably because of changes to pom.xml. Would you get a chance to merge with the master once again? |
extras/spark-kinesis-asl/pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename is the directory spark-kinesis-asl to just kinesis. Its already in spark code base, so does not quite need another spark in the directory name. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, I understand, you must have followed the existing spark-ganglia-gpl directory name. Let me figure out why it was named so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I spoke to @pwendell and there is not need to have spark- as part of the name directory name and profile name. So can you make it kinesis-asl at all places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. i'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline comment style should /* ... / .... /* is reserved only for scala docs.
updated the build to only include kinesis-asl inside the examples jar when -Pkinesis-asl is specified
|
QA tests have started for PR 1434. This patch merges cleanly. |
extras/kinesis-asl/pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Change this, this will be super confusing in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yikes. yeah, that's just plain wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction, this is not a nit. Its already confusing ... please change this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, didnt see your response. Good 👍
|
QA results for PR 1434: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to extend Logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed and cleaned up imports
retries around store() method
|
QA tests have started for PR 1434. This patch merges cleanly. |
|
QA results for PR 1434: |
|
Thanks you very much @cfregly! I have merged this!! |
Author: Chris Fregly <[email protected]> Closes #1434 from cfregly/master and squashes the following commits: 4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method 0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl 691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams 0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master' 74e5c7c [Chris Fregly] updated per TD's feedback. simplified examples, updated docs e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master' bf614e9 [Chris Fregly] per matei's feedback: moved the kinesis examples into the examples/ dir d17ca6d [Chris Fregly] per TD's feedback: updated docs, simplified the KinesisUtils api 912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master' 21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master' 6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client 338997e [Chris Fregly] improve build docs for kinesis 828f8ae [Chris Fregly] more cleanup e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master' cd68c0d [Chris Fregly] fixed typos and backward compatibility d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master' b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support (cherry picked from commit 91f9504) Signed-off-by: Tathagata Das <[email protected]>
|
This was an epic pull request. Nice work, people. |
Author: Chris Fregly <[email protected]> Closes apache#1434 from cfregly/master and squashes the following commits: 4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method 0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl 691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams 0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master' 74e5c7c [Chris Fregly] updated per TD's feedback. simplified examples, updated docs e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master' bf614e9 [Chris Fregly] per matei's feedback: moved the kinesis examples into the examples/ dir d17ca6d [Chris Fregly] per TD's feedback: updated docs, simplified the KinesisUtils api 912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master' 21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master' 6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client 338997e [Chris Fregly] improve build docs for kinesis 828f8ae [Chris Fregly] more cleanup e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master' cd68c0d [Chris Fregly] fixed typos and backward compatibility d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master' b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why do we need this log4j file here?
No description provided.