-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Branch 1.6 #11024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Branch 1.6 #11024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…conflicts with dplyr shivaram Author: felixcheung <[email protected]> Closes #10119 from felixcheung/rdocdplyrmasked. (cherry picked from commit 43c575c) Signed-off-by: Shivaram Venkataraman <[email protected]>
…ition met in Master Downgrade to warning log for unexpected state transition. andrewor14 please review, thanks a lot. Author: jerryshao <[email protected]> Closes #10091 from jerryshao/SPARK-12059. (cherry picked from commit 7bc9e1d) Signed-off-by: Andrew Or <[email protected]>
…r and AppClient `SynchronousQueue` cannot cache any task. This issue is similar to #9978. It's an easy fix. Just use the fixed `ThreadUtils.newDaemonCachedThreadPool`. Author: Shixiong Zhu <[email protected]> Closes #10108 from zsxwing/fix-threadpool. (cherry picked from commit 649be4f) Signed-off-by: Shixiong Zhu <[email protected]>
**Problem.** Event logs in 1.6 were much bigger than 1.5. I ran page rank and the event log size in 1.6 was almost 5x that in 1.5. I did a bisect to find that the RDD callsite added in #9398 is largely responsible for this. **Solution.** This patch removes the long form of the callsite (which is not used!) from the event log. This reduces the size of the event log significantly. *Note on compatibility*: if this patch is to be merged into 1.6.0, then it won't break any compatibility. Otherwise, if it is merged into 1.6.1, then we might need to add more backward compatibility handling logic (currently does not exist yet). Author: Andrew Or <[email protected]> Closes #10115 from andrewor14/smaller-event-logs. (cherry picked from commit 688e521) Signed-off-by: Andrew Or <[email protected]>
Use ```coefficients``` replace ```weights```, I wish they are the last two. mengxr Author: Yanbo Liang <[email protected]> Closes #10065 from yanboliang/coefficients. (cherry picked from commit d576e76) Signed-off-by: Xiangrui Meng <[email protected]>
I haven't created a JIRA. If we absolutely need one I'll do it, but I'm fine with not getting mentioned in the release notes if that's the only purpose it'll serve. cc marmbrus - We should include this in 1.6-RC2 if there is one. I can open a second PR against branch-1.6 if necessary. Author: Nicholas Chammas <[email protected]> Closes #10109 from nchammas/spark-ec2-versions. (cherry picked from commit ad7cea6) Signed-off-by: Shivaram Venkataraman <[email protected]>
…tdown after test Author: Tathagata Das <[email protected]> Closes #10124 from tdas/InputStreamSuite-flaky-test. (cherry picked from commit a02d472) Signed-off-by: Tathagata Das <[email protected]>
…ck param and fix doc and add tests. Spark submit expects comma-separated list Author: felixcheung <[email protected]> Closes #10034 from felixcheung/sparkrinitdoc. (cherry picked from commit 2213441) Signed-off-by: Shivaram Venkataraman <[email protected]>
…tConf. TaskAttemptContext's constructor will clone the configuration instead of referencing it. Calling setConf after creating TaskAttemptContext makes any changes to the configuration made inside setConf unperceived by RecordReader instances. As an example, Titan's InputFormat will change conf when calling setConf. They wrap their InputFormat around Cassandra's ColumnFamilyInputFormat, and append Cassandra's configuration. This change fixes the following error when using Titan's CassandraInputFormat with Spark: *java.lang.RuntimeException: org.apache.thrift.protocol.TProtocolException: Required field 'keyspace' was not present! Struct: set_key space_args(keyspace:null)* There's a discussion of this error here: https://groups.google.com/forum/#!topic/aureliusgraphs/4zpwyrYbGAE Author: Anderson de Andrade <[email protected]> Closes #10046 from adeandrade/newhadooprdd-fix. (cherry picked from commit f434f36) Signed-off-by: Marcelo Vanzin <[email protected]>
… same name. Author: Sun Rui <[email protected]> Closes #10118 from sun-rui/SPARK-12104. (cherry picked from commit 5011f26) Signed-off-by: Shivaram Venkataraman <[email protected]>
…fter recovering StreamingContext from checkpoint Author: Tathagata Das <[email protected]> Closes #10127 from tdas/SPARK-12122. (cherry picked from commit 4106d80) Signed-off-by: Tathagata Das <[email protected]>
…ferenced When the spillable sort iterator was spilled, it was mistakenly keeping the last page in memory rather than the current page. This causes the current record to get corrupted. Author: Nong <[email protected]> Closes #10142 from nongli/spark-12089. (cherry picked from commit 95296d9) Signed-off-by: Davies Liu <[email protected]>
Python tests require access to the `KinesisTestUtils` file. When this file exists under src/test, python can't access it, since it is not available in the assembly jar. However, if we move KinesisTestUtils to src/main, we need to add the KinesisProducerLibrary as a dependency. In order to avoid this, I moved KinesisTestUtils to src/main, and extended it with ExtendedKinesisTestUtils which is under src/test that adds support for the KPL. cc zsxwing tdas Author: Burak Yavuz <[email protected]> Closes #10050 from brkyvz/kinesis-py.
…s in SparkR. Author: Sun Rui <[email protected]> Closes #9804 from sun-rui/SPARK-11774. (cherry picked from commit c8d0e16) Signed-off-by: Shivaram Venkataraman <[email protected]>
Need to match existing method signature Author: felixcheung <[email protected]> Closes #9680 from felixcheung/rcorr. (cherry picked from commit 895b6c4) Signed-off-by: Shivaram Venkataraman <[email protected]>
… be consistent with Scala/Python Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python. <del>Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note.<del> cc sun-rui felixcheung shivaram Author: Yanbo Liang <[email protected]> Closes #10123 from yanboliang/spark-12115. (cherry picked from commit 6979edf) Signed-off-by: Shivaram Venkataraman <[email protected]>
1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```. 2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0. <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del> cc shivaram sun-rui felixcheung Author: Yanbo Liang <[email protected]> Closes #10037 from yanboliang/spark-12044. (cherry picked from commit b6e8e63) Signed-off-by: Shivaram Venkataraman <[email protected]>
Author: gcc <[email protected]> Closes #10101 from rh99/master. (cherry picked from commit 04b6799) Signed-off-by: Sean Owen <[email protected]>
When \u appears in a comment block (i.e. in /**/), code gen will break. So, in Expression and CodegenFallback, we escape \u to \\u. yhuai Please review it. I did reproduce it and it works after the fix. Thanks! Author: gatorsmile <[email protected]> Closes #10155 from gatorsmile/escapeU. (cherry picked from commit 49efd03) Signed-off-by: Yin Huai <[email protected]>
…y when Jenkins load is high We need to make sure that the last entry is indeed the last entry in the queue. Author: Burak Yavuz <[email protected]> Closes #10110 from brkyvz/batch-wal-test-fix. (cherry picked from commit 6fd9e70) Signed-off-by: Tathagata Das <[email protected]>
This PR: 1. Suppress all known warnings. 2. Cleanup test cases and fix some errors in test cases. 3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext. 4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat' 5. Make sure the default Hadoop file system is local when running test cases. 6. Turn on warnings into errors. Author: Sun Rui <[email protected]> Closes #10030 from sun-rui/SPARK-12034. (cherry picked from commit 39d677c) Signed-off-by: Shivaram Venkataraman <[email protected]>
Currently, the current line is not cleared by Cltr-C
After this patch
```
>>> asdfasdf^C
Traceback (most recent call last):
File "~/spark/python/pyspark/context.py", line 225, in signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt
```
It's still worse than 1.5 (and before).
Author: Davies Liu <[email protected]>
Closes #10134 from davies/fix_cltrc.
(cherry picked from commit ef3f047)
Signed-off-by: Davies Liu <[email protected]>
…ner not present The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004). While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <[email protected]> Closes #9988 from tdas/SPARK-11932. (cherry picked from commit 5d80d8c) Signed-off-by: Tathagata Das <[email protected]>
https://issues.apache.org/jira/browse/SPARK-11963 Author: Xusen Yin <[email protected]> Closes #9962 from yinxusen/SPARK-11963. (cherry picked from commit 871e85d) Signed-off-by: Joseph K. Bradley <[email protected]>
…cala doc In SPARK-11946 the API for pivot was changed a bit and got updated doc, the doc changes were not made for the python api though. This PR updates the python doc to be consistent. Author: Andrew Ray <[email protected]> Closes #10176 from aray/sql-pivot-python-doc. (cherry picked from commit 36282f7) Signed-off-by: Yin Huai <[email protected]>
Switched from using SQLContext constructor to using getOrCreate, mainly in model save/load methods. This covers all instances in spark.mllib. There were no uses of the constructor in spark.ml. CC: mengxr yhuai Author: Joseph K. Bradley <[email protected]> Closes #10161 from jkbradley/mllib-sqlcontext-fix. (cherry picked from commit 3e7e05f) Signed-off-by: Xiangrui Meng <[email protected]>
…ing include_example Made new patch contaning only markdown examples moved to exmaple/folder. Ony three java code were not shfted since they were contaning compliation error ,these classes are 1)StandardScale 2)NormalizerExample 3)VectorIndexer Author: Xusen Yin <[email protected]> Author: somideshmukh <[email protected]> Closes #10002 from somideshmukh/SomilBranch1.33. (cherry picked from commit 78209b0) Signed-off-by: Xiangrui Meng <[email protected]>
Add since annotation to ml.classification Author: Takahashi Hiroshi <[email protected]> Closes #8534 from taishi-oss/issue10259. (cherry picked from commit 7d05a62) Signed-off-by: Xiangrui Meng <[email protected]>
…mple code Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear. Author: Yanbo Liang <[email protected]> Closes #10006 from yanboliang/spark-11958. (cherry picked from commit 4a39b5a) Signed-off-by: Xiangrui Meng <[email protected]>
…means Value Author: cody koeninger <[email protected]> Closes #10132 from koeninger/SPARK-12103. (cherry picked from commit 48a9804) Signed-off-by: Sean Owen <[email protected]>
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped. This contribution is my original work, and I license this work to the Spark project under the project's open source license. Author: Michael Allman <[email protected]> Closes #10700 from mallman/stop_event_logger_first. (cherry picked from commit 4ee8191) Signed-off-by: Sean Owen <[email protected]>
JIRA: https://issues.apache.org/jira/browse/SPARK-12961 To prevent memory leak in snappy-java, just call the method once and cache the result. After the library releases new version, we can remove this object. JoshRosen Author: Liang-Chi Hsieh <[email protected]> Closes #10875 from viirya/prevent-snappy-memory-leak. (cherry picked from commit 5936bf9) Signed-off-by: Sean Owen <[email protected]>
… hive metadata format This PR adds a new table option (`skip_hive_metadata`) that'd allow the user to skip storing the table metadata in hive metadata format. While this could be useful in general, the specific use-case for this change is that Hive doesn't handle wide schemas well (see https://issues.apache.org/jira/browse/SPARK-12682 and https://issues.apache.org/jira/browse/SPARK-6024) which in turn prevents such tables from being queried in SparkSQL. Author: Sameer Agarwal <[email protected]> Closes #10826 from sameeragarwal/skip-hive-metadata. (cherry picked from commit 08c781c) Signed-off-by: Yin Huai <[email protected]>
Author: Yin Huai <[email protected]> Closes #10925 from yhuai/branch-1.6-hot-fix.
Previously (when the PR was first created) not specifying b= explicitly was fine (and treated as default null) - instead be explicit about b being None in the test. Author: Holden Karau <[email protected]> Closes #10564 from holdenk/SPARK-12611-fix-test-infer-schema-local. (cherry picked from commit 13dab9c) Signed-off-by: Yin Huai <[email protected]>
…vaList Backport of SPARK-12834 for branch-1.6 Original PR: #10772 Original commit message: We use `SerDe.dumps()` to serialize `JavaArray` and `JavaList` in `PythonMLLibAPI`, then deserialize them with `PickleSerializer` in Python side. However, there is no need to transform them in such an inefficient way. Instead of it, we can use type conversion to convert them, e.g. `list(JavaArray)` or `list(JavaList)`. What's more, there is an issue to Ser/De Scala Array as I said in https://issues.apache.org/jira/browse/SPARK-12780 Author: Xusen Yin <[email protected]> Closes #10941 from jkbradley/yinxusen-SPARK-12834-1.6.
…ith `None` triggers cryptic failure
The error message is now changed from "Do not support type class scala.Tuple2." to "Do not support type class org.json4s.JsonAST$JNull$" to be more informative about what is not supported. Also, StructType metadata now handles JNull correctly, i.e., {'a': None}. test_metadata_null is added to tests.py to show the fix works.
Author: Jason Lee <[email protected]>
Closes #8969 from jasoncl/SPARK-10847.
(cherry picked from commit edd4737)
Signed-off-by: Yin Huai <[email protected]>
…to branch-1.6 SPARK-13082 actually fixed by #10559. However, it's a big PR and not backported to 1.6. This PR just backported the fix of 'read.json(rdd)' to branch-1.6. Author: Shixiong Zhu <[email protected]> Closes #10988 from zsxwing/json-rdd.
Apparently chrome removed `SVGElement.prototype.getTransformToElement`, which is used by our JS library dagre-d3 when creating edges. The real diff can be found here: andrewor14/dagre-d3@7d6c000, which is taken from the fix in the main repo: cpettitt/dagre-d3@1ef067f Upstream issue: https://github.com/cpettitt/dagre-d3/issues/202 Author: Andrew Or <[email protected]> Closes #10986 from andrewor14/fix-dag-viz. (cherry picked from commit 70e69fc) Signed-off-by: Andrew Or <[email protected]>
…uildPartitionedTableScan Hello Michael & All: We have some issues to submit the new codes in the other PR(#10299), so we closed that PR and open this one with the fix. The reason for the previous failure is that the projection for the scan when there is a filter that is not pushed down (the "left-over" filter) could be different, in elements or ordering, from the original projection. With this new codes, the approach to solve this problem is: Insert a new Project if the "left-over" filter is nonempty and (the original projection is not empty and the projection for the scan has more than one elements which could otherwise cause different ordering in projection). We create 3 test cases to cover the otherwise failure cases. Author: Kevin Yu <[email protected]> Closes #10388 from kevinyu98/spark-12231. (cherry picked from commit fd50df4) Signed-off-by: Cheng Lian <[email protected]>
JIRA: https://issues.apache.org/jira/browse/SPARK-12989 In the rule `ExtractWindowExpressions`, we simply replace alias by the corresponding attribute. However, this will cause an issue exposed by the following case: ```scala val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", "num") .withColumn("Data", struct("A", "B", "C")) .drop("A") .drop("B") .drop("C") val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc) data.select($"*", max("num").over(winSpec) as "max").explain(true) ``` In this case, both `Data.A` and `Data.B` are `alias` in `WindowSpecDefinition`. If we replace these alias expression by their alias names, we are unable to know what they are since they will not be put in `missingExpr` too. Author: gatorsmile <[email protected]> Author: xiaoli <[email protected]> Author: Xiao Li <[email protected]> Closes #10963 from gatorsmile/seletStarAfterColDrop. (cherry picked from commit 33c8a49) Signed-off-by: Michael Armbrust <[email protected]>
ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO <[email protected]> Closes #10901 from maropu/DocFix. (cherry picked from commit da9146c) Signed-off-by: Michael Armbrust <[email protected]>
Changed a target at branch-1.6 from #10635. Author: Takeshi YAMAMURO <[email protected]> Closes #10915 from maropu/pr9935-v3.
It is not valid to call `toAttribute` on a `NamedExpression` unless we know for sure that the child produced that `NamedExpression`. The current code worked fine when the grouping expressions were simple, but when they were a derived value this blew up at execution time. Author: Michael Armbrust <[email protected]> Closes #11011 from marmbrus/groupByFunction.
|
Can one of the admins verify this patch? |
Contributor
|
@RD1991 Could you close this? |
Author: Michael Armbrust <[email protected]> Closes #11014 from marmbrus/seqEncoders. (cherry picked from commit 29d9218) Signed-off-by: Michael Armbrust <[email protected]>
…ML python models' properties Backport of [SPARK-12780] for branch-1.6 Original PR for master: #10724 This fixes StringIndexerModel.labels in pyspark. Author: Xusen Yin <[email protected]> Closes #10950 from jkbradley/yinxusen-spark-12780-backport.
I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629 Please, let me know what do you think. Thanks! Author: Narine Kokhlikyan <[email protected]> Closes #10580 from NarineK/sparkrSavaAsRable. (cherry picked from commit 8a88e12) Signed-off-by: Shivaram Venkataraman <[email protected]>
java mapwithstate with Function3 has wrong conversion of java `Optional` to scala `Option`, fixed code uses same conversion used in the mapwithstate call that uses Function4 as an input. `Optional.fromNullable(v.get)` fails if v is `None`, better to use `JavaUtils.optionToOptional(v)` instead. Author: Gabriele Nizzoli <[email protected]> Closes #11007 from gabrielenizzoli/branch-1.6.
…lumn name duplication Fixes problem and verifies fix by test suite. Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn and deduplicates SchemaUtils.appendColumn functions. Author: Grzegorz Chilkiewicz <[email protected]> Closes #10741 from grzegorz-chilkiewicz/master. (cherry picked from commit b1835d7) Signed-off-by: Joseph K. Bradley <[email protected]>
Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE would be thrown. Author: Daoyuan Wang <[email protected]> Closes #10964 from adrian-wang/npewriter. (cherry picked from commit 358300c) Signed-off-by: Michael Armbrust <[email protected]> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
The example will throw error like <console>:20: error: not found: value StructType Need to add this line: import org.apache.spark.sql.types._ Author: Kevin (Sangwoo) Kim <[email protected]> Closes #10141 from swkimme/patch-1. (cherry picked from commit b377b03) Signed-off-by: Michael Armbrust <[email protected]>
Contributor
|
@RD1991 please close this PR |
https://issues.apache.org/jira/browse/SPARK-13122 A race condition can occur in MemoryStore's unrollSafely() method if two threads that return the same value for currentTaskAttemptId() execute this method concurrently. This change makes the operation of reading the initial amount of unroll memory used, performing the unroll, and updating the associated memory maps atomic in order to avoid this race condition. Initial proposed fix wraps all of unrollSafely() in a memoryManager.synchronized { } block. A cleaner approach might be introduce a mechanism that synchronizes based on task attempt ID. An alternative option might be to track unroll/pending unroll memory based on block ID rather than task attempt ID. Author: Adam Budde <[email protected]> Closes #11012 from budde/master. (cherry picked from commit ff71261) Signed-off-by: Andrew Or <[email protected]> Conflicts: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
…uration columns I have clearly prefix the two 'Duration' columns in 'Details of Batch' Streaming tab as 'Output Op Duration' and 'Job Duration' Author: Mario Briggs <[email protected]> Author: mariobriggs <[email protected]> Closes #11022 from mariobriggs/spark-12739. (cherry picked from commit e9eb248) Signed-off-by: Shixiong Zhu <[email protected]>
…ld not fail analysis of encoder nullability should only be considered as an optimization rather than part of the type system, so instead of failing analysis for mismatch nullability, we should pass analysis and add runtime null check. backport #11035 to 1.6 Author: Wenchen Fan <[email protected]> Closes #11042 from cloud-fan/branch-1.6.
minor fix for api link in ml onevsrest Author: Yuhao Yang <[email protected]> Closes #11068 from hhbyyh/onevsrestDoc. (cherry picked from commit c2c956b) Signed-off-by: Xiangrui Meng <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.