-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7501] [Streaming] DAG visualization: show DStream operations #6034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32313 has started for PR 6034 at commit |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32316 has started for PR 6034 at commit |
|
Test build #32313 timed out for PR 6034 at commit |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Test build #32316 has finished for PR 6034 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not for this PR. The other PR still not merged in master .. isnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops I can't believe I accidentally merged these two PRs.
|
I think there should be a few unit tests to test this. See how the creation sites of the DStream are tested in StreamingContextSuite, test "DStream and generated RDD creation sites" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you store only the name in as a DStream field? I does not make sense to save this arbitrary mostly unnecessary RDDOperationScope object to be serialized with DStreams as part of DStream checkpoints.
|
This doesnt yet set the scope for input stream (that is DStream that are directly created, not using DStream transformations). I think you need to use the withScope in places like StreamingContext.socketStream(). Example: This needs to be done for every subclass of |
e85b787 to
c121047
Compare
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32650 has started for PR 6034 at commit |
This covers all the places where we instantiate DStreams.
|
Test build #32650 has finished for PR 6034 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32933 has started for PR 6034 at commit |
|
Test build #32933 has finished for PR 6034 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #33002 has started for PR 6034 at commit |
|
LGTM! Thanks @andrewor14 for this patch. |
|
Test build #33002 has finished for PR 6034 at commit
|
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
retest this please |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #33009 has started for PR 6034 at commit |
|
Test build #33009 has finished for PR 6034 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Great, merging into master 1.4. |
This is similar to #5999, but for streaming. Roughly 200 lines are tests. One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way. tdas zsxwing ------------------------ **Before** <img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/> -------------------------- **After** <img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/> Author: Andrew Or <[email protected]> Closes #6034 from andrewor14/dag-viz-streaming and squashes the following commits: 932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e685df9 [Andrew Or] Rename createRDDWith 84d0656 [Andrew Or] Review feedback 697c086 [Andrew Or] Fix tests 53b9936 [Andrew Or] Set scopes for foreachRDD properly 1881802 [Andrew Or] Refactor DStream scope names again af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming fd07d22 [Andrew Or] Make MQTT lower case f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within 1af0b0e [Andrew Or] Fix style 074c00b [Andrew Or] Review comments d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e4a93ac [Andrew Or] Fix tests? 25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 9113183 [Andrew Or] Add tests for DStream scopes b3806ab [Andrew Or] Fix test bb80bbb [Andrew Or] Fix MIMA? 5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 5703939 [Andrew Or] Rename operations that create InputDStreams 7c4513d [Andrew Or] Group RDDs by DStream operations and batches bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 05c2676 [Andrew Or] Wrap many more methods in withScope c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 65ef3e9 [Andrew Or] Fix NPE a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations (cherry picked from commit b93c97d) Signed-off-by: Andrew Or <[email protected]>
This is similar to apache#5999, but for streaming. Roughly 200 lines are tests. One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way. tdas zsxwing ------------------------ **Before** <img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/> -------------------------- **After** <img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/> Author: Andrew Or <[email protected]> Closes apache#6034 from andrewor14/dag-viz-streaming and squashes the following commits: 932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e685df9 [Andrew Or] Rename createRDDWith 84d0656 [Andrew Or] Review feedback 697c086 [Andrew Or] Fix tests 53b9936 [Andrew Or] Set scopes for foreachRDD properly 1881802 [Andrew Or] Refactor DStream scope names again af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming fd07d22 [Andrew Or] Make MQTT lower case f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within 1af0b0e [Andrew Or] Fix style 074c00b [Andrew Or] Review comments d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e4a93ac [Andrew Or] Fix tests? 25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 9113183 [Andrew Or] Add tests for DStream scopes b3806ab [Andrew Or] Fix test bb80bbb [Andrew Or] Fix MIMA? 5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 5703939 [Andrew Or] Rename operations that create InputDStreams 7c4513d [Andrew Or] Group RDDs by DStream operations and batches bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 05c2676 [Andrew Or] Wrap many more methods in withScope c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 65ef3e9 [Andrew Or] Fix NPE a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations
This is similar to apache#5999, but for streaming. Roughly 200 lines are tests. One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way. tdas zsxwing ------------------------ **Before** <img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/> -------------------------- **After** <img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/> Author: Andrew Or <[email protected]> Closes apache#6034 from andrewor14/dag-viz-streaming and squashes the following commits: 932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e685df9 [Andrew Or] Rename createRDDWith 84d0656 [Andrew Or] Review feedback 697c086 [Andrew Or] Fix tests 53b9936 [Andrew Or] Set scopes for foreachRDD properly 1881802 [Andrew Or] Refactor DStream scope names again af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming fd07d22 [Andrew Or] Make MQTT lower case f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within 1af0b0e [Andrew Or] Fix style 074c00b [Andrew Or] Review comments d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e4a93ac [Andrew Or] Fix tests? 25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 9113183 [Andrew Or] Add tests for DStream scopes b3806ab [Andrew Or] Fix test bb80bbb [Andrew Or] Fix MIMA? 5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 5703939 [Andrew Or] Rename operations that create InputDStreams 7c4513d [Andrew Or] Group RDDs by DStream operations and batches bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 05c2676 [Andrew Or] Wrap many more methods in withScope c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 65ef3e9 [Andrew Or] Fix NPE a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations
This is similar to apache#5999, but for streaming. Roughly 200 lines are tests. One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way. tdas zsxwing ------------------------ **Before** <img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/> -------------------------- **After** <img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/> Author: Andrew Or <[email protected]> Closes apache#6034 from andrewor14/dag-viz-streaming and squashes the following commits: 932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e685df9 [Andrew Or] Rename createRDDWith 84d0656 [Andrew Or] Review feedback 697c086 [Andrew Or] Fix tests 53b9936 [Andrew Or] Set scopes for foreachRDD properly 1881802 [Andrew Or] Refactor DStream scope names again af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming fd07d22 [Andrew Or] Make MQTT lower case f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within 1af0b0e [Andrew Or] Fix style 074c00b [Andrew Or] Review comments d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming e4a93ac [Andrew Or] Fix tests? 25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 9113183 [Andrew Or] Add tests for DStream scopes b3806ab [Andrew Or] Fix test bb80bbb [Andrew Or] Fix MIMA? 5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 5703939 [Andrew Or] Rename operations that create InputDStreams 7c4513d [Andrew Or] Group RDDs by DStream operations and batches bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 05c2676 [Andrew Or] Wrap many more methods in withScope c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming 65ef3e9 [Andrew Or] Fix NPE a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations
This is similar to #5999, but for streaming. Roughly 200 lines are tests.
One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a
try finallyblock to set the relevant variables in a safer way.@tdas @zsxwing
Before

After
