[SQL] [outdated] SPARK-6981: Factor out SparkPlanner and QueryExecution from SQLContext #5556

evacchi · 2015-04-17T13:47:31Z

Dependent types add additional, unnecessary complexity to third-parties who may want to extend SQLContext with new rewriting strategies; moreover, HiveContext benefits from this simplifying change as well.

AmplabJenkins · 2015-04-17T13:52:12Z

Can one of the admins verify this patch?

marmbrus · 2015-04-20T23:33:37Z

ok to test

SparkQA · 2015-04-20T23:38:39Z

Test build #30612 has started for PR 5556 at commit 0b29d80.

SparkQA · 2015-04-20T23:38:45Z

Test build #30612 has finished for PR 5556 at commit 0b29d80.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]]
- protected[sql] class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan)
- protected[sql] class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies
- protected[sql] class HiveQueryExecution(hiveContext: HiveContext, logicalPlan: LogicalPlan)
This patch does not change any dependencies.

AmplabJenkins · 2015-04-20T23:38:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30612/
Test FAILed.

SparkQA · 2015-04-21T08:33:39Z

Test build #30655 has started for PR 5556 at commit 7cc8fd5.

SparkQA · 2015-04-21T08:35:11Z

Test build #30655 has finished for PR 5556 at commit 7cc8fd5.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]]
- protected[sql] class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan)
- protected[sql] class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies
- protected[sql] class HiveQueryExecution(hiveContext: HiveContext, logicalPlan: LogicalPlan)
This patch does not change any dependencies.

AmplabJenkins · 2015-04-21T08:35:12Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30655/
Test FAILed.

SparkQA · 2015-04-21T08:43:50Z

Test build #30656 has started for PR 5556 at commit 5a0a9f8.

SparkQA · 2015-04-21T08:48:52Z

Test build #30656 has finished for PR 5556 at commit 5a0a9f8.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]]
- protected[sql] class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan)
- protected[sql] class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies
- protected[sql] class HiveQueryExecution(hiveContext: HiveContext, logicalPlan: LogicalPlan)
This patch does not change any dependencies.

AmplabJenkins · 2015-04-21T08:48:53Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30656/
Test FAILed.

SparkQA · 2015-04-21T09:08:32Z

Test build #30660 has started for PR 5556 at commit 95a26c3.

SparkQA · 2015-04-21T10:25:19Z

Test build #30660 has finished for PR 5556 at commit 95a26c3.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]]
- protected[sql] class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan)
- protected[sql] class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies
- protected[sql] class HiveQueryExecution(hiveContext: HiveContext, logicalPlan: LogicalPlan)
This patch does not change any dependencies.

AmplabJenkins · 2015-04-21T10:25:23Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30660/
Test FAILed.

evacchi · 2015-04-21T10:29:11Z

I may be mistaken here but my code does not change any of PySpark, and the stack trace seems to be pointing at a config error?

akka.ConfigurationException: Logger specified in config can't be loaded [akka.event.slf4j.Slf4jLogger] due to [akka.event.Logging$LoggerInitializationException: Logger log1-Slf4jLogger did not respond with LoggerInitialized, sent instead [TIMEOUT]]

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30660/console

evacchi · 2015-04-22T13:41:13Z

I can confirm that the pyspark/sql/dataframe.py test case passes successfully on my branch.

marmbrus · 2015-04-22T18:40:48Z

Sorry, it is possible that you hit a flakey test. If you fix the merge conflicts we can try again.

SparkQA · 2015-04-23T08:08:48Z

Test build #30825 has started for PR 5556 at commit 77a5df4.

evacchi · 2015-04-23T08:09:00Z

I tried to rebase. Hope it's not a problem.

SparkQA · 2015-04-23T09:47:47Z

Test build #30825 has finished for PR 5556 at commit 77a5df4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]]
- abstract class NumericType extends NativeType
- protected[sql] class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan)
- protected[sql] class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies
- protected[sql] class HiveQueryExecution(hiveContext: HiveContext, logicalPlan: LogicalPlan)
This patch does not change any dependencies.

AmplabJenkins · 2015-04-23T09:47:51Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30825/
Test PASSed.

AmplabJenkins · 2015-04-27T18:18:29Z

Can one of the admins verify this patch?

This patch contains an `IvyTestUtils` file, which dynamically generates jars and pom files to test the `--packages` feature without having to rely on the internet, and Maven Central. cc pwendell I know that there existed Util functions to create Jars and stuff already, but they didn't really serve my purposes as they appended random prefixes that was breaking things. I also added the local repository tests. Notice that they work without passing the `repo` to `resolveMavenCoordinates`. Author: Burak Yavuz <[email protected]> Closes apache#5790 from brkyvz/maven-utils and squashes the following commits: 3ec79b7 [Burak Yavuz] addressed comments v0.2 a39151b [Burak Yavuz] address comments v0.1 172dfef [Burak Yavuz] use Ivy format 7476d06 [Burak Yavuz] added mock repository generator

…RN/HDFS Current Spark apps running on Secure YARN/HDFS would not be able to write data to HDFS after 7 days, since delegation tokens cannot be renewed beyond that. This means Spark Streaming apps will not be able to run on Secure YARN. This commit adds basic functionality to fix this issue. In this patch: - new parameters are added - principal and keytab, which can be used to login to a KDC - the client logs in, and then get tokens to start the AM - the keytab is copied to the staging directory - the AM waits for 60% of the time till expiry of the tokens and then logs in using the keytab - each time after 60% of the time, new tokens are created and sent to the executors Currently, to avoid complicating the architecture, we set the keytab and principal in the SparkHadoopUtil singleton, and schedule a login. Once the login is completed, a callback is scheduled. This is being posted for feedback, so I can gather feedback on the general implementation. There are currently a bunch of things to do: - [x] logging - [x] testing - I plan to manually test this soon. If you have ideas of how to add unit tests, comment. - [x] add code to ensure that if these params are set in non-YARN cluster mode, we complain - [x] documentation - [x] Have the executors request for credentials from the AM, so that retries are possible. Author: Hari Shreedharan <[email protected]> Closes apache#4688 from harishreedharan/kerberos-longrunning and squashes the following commits: 36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments. 611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens. 09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml 6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to. 072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils. f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens. ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates. 7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required. e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message. 0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 7f1bc58 [Hari Shreedharan] Minor fixes, cleanup. bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup. f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files. 2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted. 61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS. 62c45ce [Hari Shreedharan] Relogin from keytab periodically. fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues. 42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master. 0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning 55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity. 9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes. f4fd711 [Hari Shreedharan] Fix SparkConf usage. 2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS. af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required. f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file. f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials. 5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues. b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS 0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire. d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens() 8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object. fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start() 41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests. bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil. f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled. 2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS. ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos 77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.

…parkBuild Added ml.recommendation, ml.regression to SparkBuild CC: mengxr Author: Joseph K. Bradley <[email protected]> Closes apache#5758 from jkbradley/SPARK-7207 and squashes the following commits: a28158a [Joseph K. Bradley] Added ml.recommendation, ml.regression to SparkBuild

…ecure YARN/HDFS" This reverts commit 6c65da6.

JIRA: https://issues.apache.org/jira/browse/SPARK-7196 Author: Liang-Chi Hsieh <[email protected]> Closes apache#5777 from viirya/jdbc_precision and squashes the following commits: f40f5e6 [Liang-Chi Hsieh] Support precision and scale for NUMERIC type. 49acbf9 [Liang-Chi Hsieh] Add unit test. a509e19 [Liang-Chi Hsieh] Support precision and scale of decimal type for JDBC.

…; add facade in front of Unsafe; remove use of Unsafe.setMemory This patch suppresses compiler warnings due to our use of `sun.misc.Unsafe` (introduced in apache#5725). These warnings can only be suppressed via the `-XDignore.symbol.file` javac flag; the `SuppressWarnings` annotation won't work for these. In order to restrict uses of this compiler flag to the `unsafe` module, I placed a facade in front of `Unsafe` so that other modules won't call it directly. This facade also will also help us to avoid accidental usage of deprecated Unsafe methods or methods that aren't supported in Java 6. I also removed an unnecessary use of `Unsafe.setMemory`, which isn't present in certain versions of Java 6, and excluded the new `unsafe` module from Javadoc. Author: Josh Rosen <[email protected]> Closes apache#5814 from JoshRosen/unsafe-compiler-warnings-fixes and squashes the following commits: 9e8c483 [Josh Rosen] Exclude new unsafe module from Javadoc ba75ecf [Josh Rosen] Only apply -XDignore.symbol.file flag in unsafe project. 7403345 [Josh Rosen] Put facade in front of Unsafe. 50230c0 [Josh Rosen] Remove usage of Unsafe.setMemory 96d41c9 [Josh Rosen] Use -XDignore.symbol.file to suppress warnings about sun.misc.Unsafe usage

SQL ``` select key from (select key,value from t1 limit 100) t2 limit 10 ``` Optimized Logical Plan before modifying ``` == Optimized Logical Plan == Limit 10 Project key#228 Limit 100 MetastoreRelation default, t1, None ``` Optimized Logical Plan after modifying ``` == Optimized Logical Plan == Limit 10 Limit 100 Project key#228 MetastoreRelation default, t1, None ``` After this, we can combine limits Author: Zhongshuai Pei <[email protected]> Author: DoingDone9 <[email protected]> Closes apache#5797 from DoingDone9/ProjectLimit and squashes the following commits: 70d0fca [Zhongshuai Pei] Update FilterPushdownSuite.scala dc83ae9 [Zhongshuai Pei] Update FilterPushdownSuite.scala 485c61c [Zhongshuai Pei] Update Optimizer.scala f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master 802261c [DoingDone9] Merge pull request apache#7 from apache/master d00303b [DoingDone9] Merge pull request apache#6 from apache/master 98b134f [DoingDone9] Merge pull request apache#5 from apache/master 161cae3 [DoingDone9] Merge pull request apache#4 from apache/master c87e8b6 [DoingDone9] Merge pull request apache#3 from apache/master cb1852d [DoingDone9] Merge pull request apache#2 from apache/master c3f046f [DoingDone9] Merge pull request apache#1 from apache/master

Author: Reynold Xin <[email protected]> Closes apache#6071 from rxin/parserdialect and squashes the following commits: ca2eb31 [Reynold Xin] Rename Dialect -> ParserDialect.

add docs for https://issues.apache.org/jira/browse/SPARK-6994 Author: vidmantas zemleris <[email protected]> Closes apache#6030 from vidma/docs/row-with-named-fields and squashes the following commits: 241b401 [vidmantas zemleris] [SPARK-6994][SQL] Update docs for fetching Row fields by name

Signed-off-by: Edoardo Vacchi <[email protected]>

…to sqlctx-refactor Conflicts: sql/core/src/main/scala/org/apache/spark/sql/QueryExecution.scala sql/core/src/main/scala/org/apache/spark/sql/SparkPlanner.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQueryExecution.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala

evacchi · 2015-05-13T07:39:11Z

can you restart the Jenkins build? let's see if anything got messed up.

rxin · 2015-05-13T07:40:04Z

Jenkins, ok to test.

AmplabJenkins · 2015-05-13T07:42:16Z

Merged build triggered.

AmplabJenkins · 2015-05-13T07:42:24Z

Merged build started.

SparkQA · 2015-05-13T07:42:50Z

Test build #32598 has started for PR 5556 at commit 382e933.

SparkQA · 2015-05-13T07:47:30Z

Test build #32598 has finished for PR 5556 at commit 382e933.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]]
- protected[sql] class QueryExecution(val sqlContext: SQLContext, val logical: LogicalPlan)
- protected[sql] class SparkPlanner(val sqlContext: SQLContext) extends SparkStrategies
- protected[sql] class HiveQueryExecution(hiveContext: HiveContext, logicalPlan: LogicalPlan)

AmplabJenkins · 2015-05-13T07:47:31Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-13T07:47:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32598/
Test FAILed.

evacchi · 2015-05-13T14:52:12Z

cleaned up in PR #6122. Disregard this PR

pwendell and others added 8 commits April 30, 2015 01:02

[HOTFIX] Disabling flaky test (fix in progress as part of SPARK-7224)

47bf406

Revert "[SPARK-5342] [YARN] Allow long running Spark apps to run on s…

e0628f2

…ecure YARN/HDFS" This reverts commit 6c65da6.

rxin and others added 17 commits May 11, 2015 22:06

[SQL] Rename Dialect -> ParserDialect.

1669675

Author: Reynold Xin <[email protected]> Closes apache#6071 from rxin/parserdialect and squashes the following commits: ca2eb31 [Reynold Xin] Rename Dialect -> ParserDialect.

Refactor out SparkPlanner from SQLContext

d916ad9

Signed-off-by: Edoardo Vacchi <[email protected]>

Cleanup HiveContext, following SparkContext refactoring

72f35d8

Signed-off-by: Edoardo Vacchi <[email protected]>

Refactor out QueryExecution from SQLContext

78e74a0

Signed-off-by: Edoardo Vacchi <[email protected]>

Factor out HiveQueryExecution from HiveContext

b96d2dc

Signed-off-by: Edoardo Vacchi <[email protected]>

Revert erroneous test rename

0c7fcd6

Signed-off-by: Edoardo Vacchi <[email protected]>

Move prepareForExecution inside QueryExecution

51194d3

Add Apache license headers

31be7f2

Fix Thriftserver Build

1801efc

Refactor out SparkPlanner from SQLContext

6c3af85

Signed-off-by: Edoardo Vacchi <[email protected]>

Cleanup HiveContext, following SparkContext refactoring

aef1974

Signed-off-by: Edoardo Vacchi <[email protected]>

Refactor out QueryExecution from SQLContext

1e957ff

Signed-off-by: Edoardo Vacchi <[email protected]>

Factor out HiveQueryExecution from HiveContext

59544f9

Signed-off-by: Edoardo Vacchi <[email protected]>

Revert erroneous test rename

cccf924

Signed-off-by: Edoardo Vacchi <[email protected]>

Move prepareForExecution inside QueryExecution

e8ace9c

evacchi mentioned this pull request May 13, 2015

[SQL] SPARK-6981: Factor out SparkPlanner and QueryExecution from SQLContext #6122

Closed

evacchi closed this May 13, 2015

evacchi deleted the sqlctx-refactor branch May 18, 2015 08:12

evacchi changed the title ~~[SQL] SPARK-6981: Factor out SparkPlanner and QueryExecution from SQLContext~~ [SQL] [outdated] SPARK-6981: Factor out SparkPlanner and QueryExecution from SQLContext May 22, 2015

[SQL] [outdated] SPARK-6981: Factor out SparkPlanner and QueryExecution from SQLContext #5556

[SQL] [outdated] SPARK-6981: Factor out SparkPlanner and QueryExecution from SQLContext #5556

Uh oh!

Conversation

evacchi commented Apr 17, 2015

Uh oh!

AmplabJenkins commented Apr 17, 2015

Uh oh!

marmbrus commented Apr 20, 2015

Uh oh!

SparkQA commented Apr 20, 2015

Uh oh!

SparkQA commented Apr 20, 2015

Uh oh!

AmplabJenkins commented Apr 20, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

AmplabJenkins commented Apr 21, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

AmplabJenkins commented Apr 21, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

AmplabJenkins commented Apr 21, 2015

Uh oh!

evacchi commented Apr 21, 2015

Uh oh!

evacchi commented Apr 22, 2015

Uh oh!

marmbrus commented Apr 22, 2015

Uh oh!

SparkQA commented Apr 23, 2015

Uh oh!

evacchi commented Apr 23, 2015

Uh oh!

SparkQA commented Apr 23, 2015

Uh oh!

AmplabJenkins commented Apr 23, 2015

Uh oh!

AmplabJenkins commented Apr 27, 2015

Uh oh!

evacchi commented May 13, 2015

Uh oh!

rxin commented May 13, 2015

Uh oh!

AmplabJenkins commented May 13, 2015

Uh oh!

AmplabJenkins commented May 13, 2015

Uh oh!

SparkQA commented May 13, 2015

Uh oh!

SparkQA commented May 13, 2015

Uh oh!

AmplabJenkins commented May 13, 2015

Uh oh!

AmplabJenkins commented May 13, 2015

Uh oh!

evacchi commented May 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

78 participants