-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SQL] [outdated] SPARK-6981: Factor out SparkPlanner and QueryExecution from SQLContext #5556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
ok to test |
|
Test build #30612 has started for PR 5556 at commit |
|
Test build #30612 has finished for PR 5556 at commit
|
|
Test FAILed. |
|
Test build #30655 has started for PR 5556 at commit |
|
Test build #30655 has finished for PR 5556 at commit
|
|
Test FAILed. |
|
Test build #30656 has started for PR 5556 at commit |
|
Test build #30656 has finished for PR 5556 at commit
|
|
Test FAILed. |
|
Test build #30660 has started for PR 5556 at commit |
|
Test build #30660 has finished for PR 5556 at commit
|
|
Test FAILed. |
|
I may be mistaken here but my code does not change any of PySpark, and the stack trace seems to be pointing at a config error?
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30660/console |
|
I can confirm that the pyspark/sql/dataframe.py test case passes successfully on my branch. |
|
Sorry, it is possible that you hit a flakey test. If you fix the merge conflicts we can try again. |
|
Test build #30825 has started for PR 5556 at commit |
|
I tried to rebase. Hope it's not a problem. |
|
Test build #30825 has finished for PR 5556 at commit
|
|
Test PASSed. |
|
Can one of the admins verify this patch? |
This patch contains an `IvyTestUtils` file, which dynamically generates jars and pom files to test the `--packages` feature without having to rely on the internet, and Maven Central. cc pwendell I know that there existed Util functions to create Jars and stuff already, but they didn't really serve my purposes as they appended random prefixes that was breaking things. I also added the local repository tests. Notice that they work without passing the `repo` to `resolveMavenCoordinates`. Author: Burak Yavuz <[email protected]> Closes apache#5790 from brkyvz/maven-utils and squashes the following commits: 3ec79b7 [Burak Yavuz] addressed comments v0.2 a39151b [Burak Yavuz] address comments v0.1 172dfef [Burak Yavuz] use Ivy format 7476d06 [Burak Yavuz] added mock repository generator
…RN/HDFS Current Spark apps running on Secure YARN/HDFS would not be able to write data to HDFS after 7 days, since delegation tokens cannot be renewed beyond that. This means Spark Streaming apps will not be able to run on Secure YARN. This commit adds basic functionality to fix this issue. In this patch: - new parameters are added - principal and keytab, which can be used to login to a KDC - the client logs in, and then get tokens to start the AM - the keytab is copied to the staging directory - the AM waits for 60% of the time till expiry of the tokens and then logs in using the keytab - each time after 60% of the time, new tokens are created and sent to the executors Currently, to avoid complicating the architecture, we set the keytab and principal in the SparkHadoopUtil singleton, and schedule a login. Once the login is completed, a callback is scheduled. This is being posted for feedback, so I can gather feedback on the general implementation. There are currently a bunch of things to do: - [x] logging - [x] testing - I plan to manually test this soon. If you have ideas of how to add unit tests, comment. - [x] add code to ensure that if these params are set in non-YARN cluster mode, we complain - [x] documentation - [x] Have the executors request for credentials from the AM, so that retries are possible. Author: Hari Shreedharan <[email protected]> Closes apache#4688 from harishreedharan/kerberos-longrunning and squashes the following commits: 36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments. 611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens. 09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml 6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to. 072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils. f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens. ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates. 7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required. e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message. 0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 7f1bc58 [Hari Shreedharan] Minor fixes, cleanup. bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup. f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files. 2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted. 61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS. 62c45ce [Hari Shreedharan] Relogin from keytab periodically. fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues. 42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master. 0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity. 9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes. f4fd711 [Hari Shreedharan] Fix SparkConf usage. 2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS. af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required. f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file. f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials. 5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues. b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS 0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire. d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens() 8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object. fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start() 41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests. bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil. f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled. 2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS. ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos 77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.
…parkBuild Added ml.recommendation, ml.regression to SparkBuild CC: mengxr Author: Joseph K. Bradley <[email protected]> Closes apache#5758 from jkbradley/SPARK-7207 and squashes the following commits: a28158a [Joseph K. Bradley] Added ml.recommendation, ml.regression to SparkBuild
…ecure YARN/HDFS" This reverts commit 6c65da6.
JIRA: https://issues.apache.org/jira/browse/SPARK-7196 Author: Liang-Chi Hsieh <[email protected]> Closes apache#5777 from viirya/jdbc_precision and squashes the following commits: f40f5e6 [Liang-Chi Hsieh] Support precision and scale for NUMERIC type. 49acbf9 [Liang-Chi Hsieh] Add unit test. a509e19 [Liang-Chi Hsieh] Support precision and scale of decimal type for JDBC.
…; add facade in front of Unsafe; remove use of Unsafe.setMemory This patch suppresses compiler warnings due to our use of `sun.misc.Unsafe` (introduced in apache#5725). These warnings can only be suppressed via the `-XDignore.symbol.file` javac flag; the `SuppressWarnings` annotation won't work for these. In order to restrict uses of this compiler flag to the `unsafe` module, I placed a facade in front of `Unsafe` so that other modules won't call it directly. This facade also will also help us to avoid accidental usage of deprecated Unsafe methods or methods that aren't supported in Java 6. I also removed an unnecessary use of `Unsafe.setMemory`, which isn't present in certain versions of Java 6, and excluded the new `unsafe` module from Javadoc. Author: Josh Rosen <[email protected]> Closes apache#5814 from JoshRosen/unsafe-compiler-warnings-fixes and squashes the following commits: 9e8c483 [Josh Rosen] Exclude new unsafe module from Javadoc ba75ecf [Josh Rosen] Only apply -XDignore.symbol.file flag in unsafe project. 7403345 [Josh Rosen] Put facade in front of Unsafe. 50230c0 [Josh Rosen] Remove usage of Unsafe.setMemory 96d41c9 [Josh Rosen] Use -XDignore.symbol.file to suppress warnings about sun.misc.Unsafe usage
SQL
```
select key from (select key,value from t1 limit 100) t2 limit 10
```
Optimized Logical Plan before modifying
```
== Optimized Logical Plan ==
Limit 10
Project key#228
Limit 100
MetastoreRelation default, t1, None
```
Optimized Logical Plan after modifying
```
== Optimized Logical Plan ==
Limit 10
Limit 100
Project key#228
MetastoreRelation default, t1, None
```
After this, we can combine limits
Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>
Closes apache#5797 from DoingDone9/ProjectLimit and squashes the following commits:
70d0fca [Zhongshuai Pei] Update FilterPushdownSuite.scala
dc83ae9 [Zhongshuai Pei] Update FilterPushdownSuite.scala
485c61c [Zhongshuai Pei] Update Optimizer.scala
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request apache#5 from apache/master
161cae3 [DoingDone9] Merge pull request apache#4 from apache/master
c87e8b6 [DoingDone9] Merge pull request apache#3 from apache/master
cb1852d [DoingDone9] Merge pull request apache#2 from apache/master
c3f046f [DoingDone9] Merge pull request apache#1 from apache/master
Author: Reynold Xin <[email protected]> Closes apache#6071 from rxin/parserdialect and squashes the following commits: ca2eb31 [Reynold Xin] Rename Dialect -> ParserDialect.
add docs for https://issues.apache.org/jira/browse/SPARK-6994 Author: vidmantas zemleris <[email protected]> Closes apache#6030 from vidma/docs/row-with-named-fields and squashes the following commits: 241b401 [vidmantas zemleris] [SPARK-6994][SQL] Update docs for fetching Row fields by name
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
…to sqlctx-refactor Conflicts: sql/core/src/main/scala/org/apache/spark/sql/QueryExecution.scala sql/core/src/main/scala/org/apache/spark/sql/SparkPlanner.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQueryExecution.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
|
can you restart the Jenkins build? let's see if anything got messed up. |
|
Jenkins, ok to test. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32598 has started for PR 5556 at commit |
|
Test build #32598 has finished for PR 5556 at commit
|
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
cleaned up in PR #6122. Disregard this PR |
Dependent types add additional, unnecessary complexity to third-parties who may want to extend SQLContext with new rewriting strategies; moreover, HiveContext benefits from this simplifying change as well.