-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Branch 1.6 #10879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Branch 1.6 #10879
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… recovery issue Fixed a minor race condition in #10017 Closes #10017 Author: jerryshao <[email protected]> Author: Shixiong Zhu <[email protected]> Closes #10074 from zsxwing/review-pr10017. (cherry picked from commit f292018) Signed-off-by: Shixiong Zhu <[email protected]>
…g up TestHive.reset() When profiling HiveCompatibilitySuite, I noticed that most of the time seems to be spent in expensive `TestHive.reset()` calls. This patch speeds up suites based on HiveComparisionTest, such as HiveCompatibilitySuite, with the following changes: - Avoid `TestHive.reset()` whenever possible: - Use a simple set of heuristics to guess whether we need to call `reset()` in between tests. - As a safety-net, automatically re-run failed tests by calling `reset()` before the re-attempt. - Speed up the expensive parts of `TestHive.reset()`: loading the `src` and `srcpart` tables took roughly 600ms per test, so we now avoid this by using a simple heuristic which only loads those tables by tests that reference them. This is based on simple string matching over the test queries which errs on the side of loading in more situations than might be strictly necessary. After these changes, HiveCompatibilitySuite seems to run in about 10 minutes. This PR is a revival of #6663, an earlier experimental PR from June, where I played around with several possible speedups for this suite. Author: Josh Rosen <[email protected]> Closes #10055 from JoshRosen/speculative-testhive-reset. (cherry picked from commit ef6790f) Signed-off-by: Reynold Xin <[email protected]>
The issue is that the output commiter is not idempotent and retry attempts will fail because the output file already exists. It is not safe to clean up the file as this output committer is by design not retryable. Currently, the job fails with a confusing file exists error. This patch is a stop gap to tell the user to look at the top of the error log for the proper message. This is difficult to test locally as Spark is hardcoded not to retry. Manually verified by upping the retry attempts. Author: Nong Li <[email protected]> Author: Nong Li <[email protected]> Closes #10080 from nongli/spark-11328. (cherry picked from commit 47a0abc) Signed-off-by: Yin Huai <[email protected]>
…data source When query the Timestamp or Date column like the following val filtered = jdbcdf.where($"TIMESTAMP_COLUMN" >= beg && $"TIMESTAMP_COLUMN" < end) The generated SQL query is "TIMESTAMP_COLUMN >= 2015-01-01 00:00:00.0" It should have quote around the Timestamp/Date value such as "TIMESTAMP_COLUMN >= '2015-01-01 00:00:00.0'" Author: Huaxin Gao <[email protected]> Closes #9872 from huaxingao/spark-11788. (cherry picked from commit 5a8b5fd) Signed-off-by: Yin Huai <[email protected]>
https://issues.apache.org/jira/browse/SPARK-11352 Author: Yin Huai <[email protected]> Closes #10072 from yhuai/SPARK-11352. (cherry picked from commit 5872a9d) Signed-off-by: Yin Huai <[email protected]>
…ild of the current TreeNode, we should only return the simpleString. In TreeNode's argString, if a TreeNode is not a child of the current TreeNode, we will only return the simpleString. I tested the [following case provided by Cristian](https://issues.apache.org/jira/browse/SPARK-11596?focusedCommentId=15019241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15019241). ``` val c = (1 to 20).foldLeft[Option[DataFrame]] (None) { (curr, idx) => println(s"PROCESSING >>>>>>>>>>> $idx") val df = sqlContext.sparkContext.parallelize((0 to 10).zipWithIndex).toDF("A", "B") val union = curr.map(_.unionAll(df)).getOrElse(df) union.cache() Some(union) } c.get.explain(true) ``` Without the change, `c.get.explain(true)` took 100s. With the change, `c.get.explain(true)` took 26ms. https://issues.apache.org/jira/browse/SPARK-11596 Author: Yin Huai <[email protected]> Closes #10079 from yhuai/SPARK-11596. (cherry picked from commit e96a70d) Signed-off-by: Michael Armbrust <[email protected]>
Garbage collection triggers cleanups. If the driver JVM is huge and there is little memory pressure, we may never clean up shuffle files on executors. This is a problem for long-running applications (e.g. streaming). Author: Andrew Or <[email protected]> Closes #10070 from andrewor14/periodic-gc. (cherry picked from commit 1ce4adf) Signed-off-by: Josh Rosen <[email protected]>
The existing `spark.memory.fraction` (default 0.75) gives the system 25% of the space to work with. For small heaps, this is not enough: e.g. default 1GB leaves only 250MB system memory. This is especially a problem in local mode, where the driver and executor are crammed in the same JVM. Members of the community have reported driver OOM's in such cases. **New proposal.** We now reserve 300MB before taking the 75%. For 1GB JVMs, this leaves `(1024 - 300) * 0.75 = 543MB` for execution and storage. This is proposal (1) listed in the [JIRA](https://issues.apache.org/jira/browse/SPARK-12081). Author: Andrew Or <[email protected]> Closes #10081 from andrewor14/unified-memory-small-heaps. (cherry picked from commit d96f8c9) Signed-off-by: Andrew Or <[email protected]>
Use try to match the behavior for single distinct aggregation with Spark 1.5, but that's not scalable, we should be robust by default, have a flag to address performance regression for low cardinality aggregation. cc yhuai nongli Author: Davies Liu <[email protected]> Closes #10075 from davies/agg_15. (cherry picked from commit 96691fe) Signed-off-by: Yin Huai <[email protected]>
…HadoopFiles The JobConf object created in `DStream.saveAsHadoopFiles` is used concurrently in multiple places: * The JobConf is updated by `RDD.saveAsHadoopFile()` before the job is launched * The JobConf is serialized as part of the DStream checkpoints. These concurrent accesses (updating in one thread, while the another thread is serializing it) can lead to concurrentModidicationException in the underlying Java hashmap using in the internal Hadoop Configuration object. The solution is to create a new JobConf in every batch, that is updated by `RDD.saveAsHadoopFile()`, while the checkpointing serializes the original JobConf. Tests to be added in #9988 will fail reliably without this patch. Keeping this patch really small to make sure that it can be added to previous branches. Author: Tathagata Das <[email protected]> Closes #10088 from tdas/SPARK-12087. (cherry picked from commit 8a75a30) Signed-off-by: Shixiong Zhu <[email protected]>
Following up #10038. We can use bitmasks to determine which grouping expressions need to be set as nullable. cc yhuai Author: Liang-Chi Hsieh <[email protected]> Closes #10067 from viirya/fix-cube-following. (cherry picked from commit 0f37d1d) Signed-off-by: Yin Huai <[email protected]>
Author: Davies Liu <[email protected]> Closes #10090 from davies/fix_coalesce. (cherry picked from commit 4375eb3) Signed-off-by: Davies Liu <[email protected]>
…ons Across Different Languages I have tried to address all the comments in pull request #2447. Note that the second commit (using the new method in all internal code of all components) is quite intrusive and could be omitted. Author: Jeroen Schot <[email protected]> Closes #9767 from schot/master. (cherry picked from commit 128c290) Signed-off-by: Sean Owen <[email protected]>
When examining plans of complex queries with multiple joins, a pain point of mine is that, it's hard to immediately see the sibling node of a specific query plan node. This PR adds tree lines for the tree string of a `TreeNode`, so that the result can be visually more intuitive. Author: Cheng Lian <[email protected]> Closes #10099 from liancheng/prettier-tree-string. (cherry picked from commit a1542ce) Signed-off-by: Yin Huai <[email protected]>
cc mengxr noel-smith I worked on this issues based on #8729. ehsanmok thank you for your contricution! Author: Yu ISHIKAWA <[email protected]> Author: Ehsan M.Kermani <[email protected]> Closes #9338 from yu-iskw/JIRA-10266. (cherry picked from commit de07d06) Signed-off-by: Xiangrui Meng <[email protected]>
Author: Yadong Qi <[email protected]> Closes #10096 from watermen/patch-1. (cherry picked from commit d0d7ec5) Signed-off-by: Reynold Xin <[email protected]>
…laDoc This fixes SPARK-12000, verified on my local with JDK 7. It seems that `scaladoc` try to match method names and messed up with annotations. cc: JoshRosen jkbradley Author: Xiangrui Meng <[email protected]> Closes #10114 from mengxr/SPARK-12000.2. (cherry picked from commit 9bb695b) Signed-off-by: Xiangrui Meng <[email protected]>
…ritySuite We should try increasing a timeout in NettyBlockTransferSecuritySuite in order to reduce that suite's flakiness in Jenkins. Author: Josh Rosen <[email protected]> Closes #10113 from JoshRosen/SPARK-12082. (cherry picked from commit ae40253) Signed-off-by: Reynold Xin <[email protected]>
…toString. https://issues.apache.org/jira/browse/SPARK-12109 The change of https://issues.apache.org/jira/browse/SPARK-11596 exposed the problem. In the sql plan viz, the filter shows  After changes in this PR, the viz is back to normal.  Author: Yin Huai <[email protected]> Closes #10111 from yhuai/SPARK-12109. (cherry picked from commit ec2b6c2) Signed-off-by: Reynold Xin <[email protected]>
In Java Spec java.sql.Connection, it has boolean getAutoCommit() throws SQLException Throws: SQLException - if a database access error occurs or this method is called on a closed connection So if conn.getAutoCommit is called on a closed connection, a SQLException will be thrown. Even though the code catch the SQLException and program can continue, I think we should check conn.isClosed before calling conn.getAutoCommit to avoid the unnecessary SQLException. Author: Huaxin Gao <[email protected]> Closes #10095 from huaxingao/spark-12088. (cherry picked from commit 5349851) Signed-off-by: Sean Owen <[email protected]>
\cc mengxr Author: Jeff Zhang <[email protected]> Closes #10093 from zjffdu/mllib_typo. (cherry picked from commit 7470d9e) Signed-off-by: Sean Owen <[email protected]>
this is to fix some typo in external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala Author: microwishing <[email protected]> Closes #10121 from microwishing/master. (cherry picked from commit 95b3cf1) Signed-off-by: Sean Owen <[email protected]>
…conflicts with dplyr shivaram Author: felixcheung <[email protected]> Closes #10119 from felixcheung/rdocdplyrmasked. (cherry picked from commit 43c575c) Signed-off-by: Shivaram Venkataraman <[email protected]>
…ition met in Master Downgrade to warning log for unexpected state transition. andrewor14 please review, thanks a lot. Author: jerryshao <[email protected]> Closes #10091 from jerryshao/SPARK-12059. (cherry picked from commit 7bc9e1d) Signed-off-by: Andrew Or <[email protected]>
…r and AppClient `SynchronousQueue` cannot cache any task. This issue is similar to #9978. It's an easy fix. Just use the fixed `ThreadUtils.newDaemonCachedThreadPool`. Author: Shixiong Zhu <[email protected]> Closes #10108 from zsxwing/fix-threadpool. (cherry picked from commit 649be4f) Signed-off-by: Shixiong Zhu <[email protected]>
**Problem.** Event logs in 1.6 were much bigger than 1.5. I ran page rank and the event log size in 1.6 was almost 5x that in 1.5. I did a bisect to find that the RDD callsite added in #9398 is largely responsible for this. **Solution.** This patch removes the long form of the callsite (which is not used!) from the event log. This reduces the size of the event log significantly. *Note on compatibility*: if this patch is to be merged into 1.6.0, then it won't break any compatibility. Otherwise, if it is merged into 1.6.1, then we might need to add more backward compatibility handling logic (currently does not exist yet). Author: Andrew Or <[email protected]> Closes #10115 from andrewor14/smaller-event-logs. (cherry picked from commit 688e521) Signed-off-by: Andrew Or <[email protected]>
Use ```coefficients``` replace ```weights```, I wish they are the last two. mengxr Author: Yanbo Liang <[email protected]> Closes #10065 from yanboliang/coefficients. (cherry picked from commit d576e76) Signed-off-by: Xiangrui Meng <[email protected]>
I haven't created a JIRA. If we absolutely need one I'll do it, but I'm fine with not getting mentioned in the release notes if that's the only purpose it'll serve. cc marmbrus - We should include this in 1.6-RC2 if there is one. I can open a second PR against branch-1.6 if necessary. Author: Nicholas Chammas <[email protected]> Closes #10109 from nchammas/spark-ec2-versions. (cherry picked from commit ad7cea6) Signed-off-by: Shivaram Venkataraman <[email protected]>
…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <[email protected]>
…thon3 This replaces the `execfile` used for running custom python shell scripts with explicit open, compile and exec (as recommended by 2to3). The reason for this change is to make the pythonstartup option compatible with python3. Author: Erik Selin <[email protected]> Closes #10255 from tyro89/pythonstartup-python3. (cherry picked from commit e4e0b3f) Signed-off-by: Josh Rosen <[email protected]>
I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it.
```
ERROR spark.TaskContextImpl: Error in TaskCompletionListener
java.lang.NullPointerException
at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
at org.apache.spark.scheduler.Task.run(Task.scala:91)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
```
Author: Carson Wang <[email protected]>
Closes #10637 from carsonwang/FixNPE.
(cherry picked from commit eabc7b8)
Signed-off-by: Josh Rosen <[email protected]>
…number of features is large jira: https://issues.apache.org/jira/browse/SPARK-12026 The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger. I tested on local and the change can improve the performance and the running time was stable. Author: Yuhao Yang <[email protected]> Closes #10146 from hhbyyh/chiSq. (cherry picked from commit 021dafc) Signed-off-by: Joseph K. Bradley <[email protected]>
When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed. Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop. This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception. Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur. Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set. Author: Bryan Cutler <[email protected]> Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844. (cherry picked from commit 56cdbd6) Signed-off-by: Sean Owen <[email protected]>
… allocation Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically. Author: Shixiong Zhu <[email protected]> Closes #10728 from zsxwing/SPARK-12784. (cherry picked from commit 501e99e) Signed-off-by: Shixiong Zhu <[email protected]>
If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message.  It's similar to SPARK-4313 . Author: root <root@R520T1.(none)> Author: Koyo Yoshida <[email protected]> Closes #10663 from yoshidakuy/SPARK-12708. (cherry picked from commit 32cca93) Signed-off-by: Kousuke Saruta <[email protected]>
Author: Oscar D. Lara Yejas <[email protected]> Author: Oscar D. Lara Yejas <[email protected]> Author: Oscar D. Lara Yejas <[email protected]> Author: Oscar D. Lara Yejas <[email protected]> Closes #9613 from olarayej/SPARK-11031. (cherry picked from commit ba4a641) Signed-off-by: Shivaram Venkataraman <[email protected]>
…read completion Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning. Author: Bryan Cutler <[email protected]> Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701. (cherry picked from commit ea104b8) Signed-off-by: Sean Owen <[email protected]>
http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline ``` val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model") ``` should be ``` val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model") ``` cc: jkbradley Author: Jeff Lam <[email protected]> Closes #10769 from Agent007/SPARK-12722. (cherry picked from commit 86972fa) Signed-off-by: Sean Owen <[email protected]>
…plied in GROUP BY clause Addresses the comments from Yin. #10520 Author: Dilip Biswal <[email protected]> Closes #10758 from dilipbiswal/spark-12558-followup. (cherry picked from commit db9a860) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
…ures Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names. cc mengxr Author: Eric Liang <[email protected]> Closes #10323 from ericl/spark-12346. (cherry picked from commit 5e492e9) Signed-off-by: Xiangrui Meng <[email protected]>
…ntegration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author: Shixiong Zhu <[email protected]> Closes #10746 from zsxwing/flume-doc. (cherry picked from commit a973f48) Signed-off-by: Tathagata Das <[email protected]>
… integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Author: Shixiong Zhu <[email protected]> Closes #10822 from zsxwing/kinesis-doc. (cherry picked from commit 721845c) Signed-off-by: Tathagata Das <[email protected]>
In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better alias if possible. However, for cases like filter, the `UnresolvedAlias` can't be resolved and actually we don't need a better alias for this case. This PR move the cast wrapping logic to `Column.named` so that we will only do it when we need a alias name. backport #10781 to 1.6 Author: Wenchen Fan <[email protected]> Closes #10819 from cloud-fan/bug.
… in interface.scala Author: proflin <[email protected]> Closes #10824 from proflin/master. (cherry picked from commit c00744e) Signed-off-by: Reynold Xin <[email protected]>
Change assertion's message so it's consistent with the code. The old message says that the invoked method was lapack.dports, where in fact it was lapack.dppsv method. Author: Wojciech Jurczyk <[email protected]> Closes #10818 from wjur/wjur/rename_error_message. (cherry picked from commit ebd9ce0) Signed-off-by: Sean Owen <[email protected]>
…ReaderBase It looks like there's one place left in the codebase, SpecificParquetRecordReaderBase, where we didn't use SparkHadoopUtil's reflective accesses of TaskAttemptContext methods, which could create problems when using a single Spark artifact with both Hadoop 1.x and 2.x. Author: Josh Rosen <[email protected]> Closes #10843 from JoshRosen/SPARK-12921.
https://issues.apache.org/jira/browse/SPARK-12747 Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real". Author: Liang-Chi Hsieh <[email protected]> Closes #10695 from viirya/fix-postgres-jdbc. (cherry picked from commit 55c7dd0) Signed-off-by: Reynold Xin <[email protected]>
|
Can one of the admins verify this patch? |
Member
|
Please close this PR @CurriedEgg |
…s don't fit in Streaming page Added CSS style to force names of input streams with receivers to wrap Author: Alex Bozarth <[email protected]> Closes #10873 from ajbozarth/spark12859. (cherry picked from commit 358a33b) Signed-off-by: Kousuke Saruta <[email protected]>
…local vs cluster srowen thanks for the PR at #10866! sorry it took me a while. This is related to #10866, basically the assignment in the lambda expression in the python example is actually invalid ``` In [1]: data = [1, 2, 3, 4, 5] In [2]: counter = 0 In [3]: rdd = sc.parallelize(data) In [4]: rdd.foreach(lambda x: counter += x) File "<ipython-input-4-fcb86c182bad>", line 1 rdd.foreach(lambda x: counter += x) ^ SyntaxError: invalid syntax ``` Author: Mortada Mehyar <[email protected]> Closes #10867 from mortada/doc_python_fix. (cherry picked from commit 56f57f8) Signed-off-by: Sean Owen <[email protected]>
…al vs cluster mode in closure handling Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode Author: Sean Owen <[email protected]> Closes #10866 from srowen/SPARK-12760. (cherry picked from commit aca2a01) Signed-off-by: Sean Owen <[email protected]>
…ialize HiveContext in PySpark
davies Mind to review ?
This is the error message after this PR
```
15/12/03 16:59:53 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
/Users/jzhang/github/spark/python/pyspark/sql/context.py:689: UserWarning: You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
warnings.warn("You must build Spark with Hive. "
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 663, in read
return DataFrameReader(self)
File "/Users/jzhang/github/spark/python/pyspark/sql/readwriter.py", line 56, in __init__
self._jreader = sqlContext._ssql_ctx.read()
File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 692, in _ssql_ctx
raise e
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: java.net.ConnectException: Call From jzhangMBPr.local/127.0.0.1 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
```
Author: Jeff Zhang <[email protected]>
Closes #10126 from zjffdu/SPARK-12120.
(cherry picked from commit e789b1d)
Signed-off-by: Josh Rosen <[email protected]>
…to Python rows When actual row length doesn't conform to specified schema field length, we should give a better error message instead of throwing an unintuitive `ArrayOutOfBoundsException`. Author: Cheng Lian <[email protected]> Closes #10886 from liancheng/spark-12624. (cherry picked from commit 3327fd2) Signed-off-by: Yin Huai <[email protected]>
…e failure Author: Andy Grove <[email protected]> Closes #10865 from andygrove/SPARK-12932. (cherry picked from commit d8e4805) Signed-off-by: Sean Owen <[email protected]>
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped. This contribution is my original work, and I license this work to the Spark project under the project's open source license. Author: Michael Allman <[email protected]> Closes #10700 from mallman/stop_event_logger_first. (cherry picked from commit 4ee8191) Signed-off-by: Sean Owen <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.