Skip to content

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Jun 24, 2015

This is #6964 for branch 1.4.

yhuai and others added 30 commits May 27, 2015 20:04
This PR has three changes:
1. Renaming the table of `ThriftServer` to `SQL`;
2. Renaming the title of the tab from `ThriftServer` to `JDBC/ODBC Server`; and
3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC Session`.

https://issues.apache.org/jira/browse/SPARK-7907

Author: Yin Huai <[email protected]>

Closes #6448 from yhuai/JDBCServer and squashes the following commits:

eadcc3d [Yin Huai] Update test.
9168005 [Yin Huai] Use SQL as the tab name.
221831e [Yin Huai] Rename ThriftServer to JDBCServer.

(cherry picked from commit 3c1f1ba)
Signed-off-by: Yin Huai <[email protected]>
…at the same time

This is a somewhat obscure bug, but I think that it will seriously impact KryoSerializer users who use custom registrators which disabled auto-reset. When auto-reset is disabled, then this breaks things in some of our shuffle paths which actually end up creating multiple OutputStreams from the same shared SerializerInstance (which is unsafe).

This was introduced by a patch (SPARK-3386) which enables serializer re-use in some of the shuffle paths, since constructing new serializer instances is actually pretty costly for KryoSerializer.  We had already fixed another corner-case (SPARK-7766) bug related to this, but missed this one.

I think that the root problem here is that KryoSerializerInstance can be used in a way which is unsafe even within a single thread, e.g. by creating multiple open OutputStreams from the same instance or by interleaving deserialize and deserializeStream calls. I considered a smaller patch which adds assertions to guard against this type of "misuse" but abandoned that approach after I realized how convoluted the Scaladoc became.

This patch fixes this bug by making it legal to create multiple streams from the same KryoSerializerInstance.  Internally, KryoSerializerInstance now implements a  `borrowKryo()` / `releaseKryo()` API that's backed by a "pool" of capacity 1. Each call to a KryoSerializerInstance method will borrow the Kryo, do its work, then release the serializer instance back to the pool. If the pool is empty and we need an instance, it will allocate a new Kryo on-demand. This makes it safe for multiple OutputStreams to be opened from the same serializer. If we try to release a Kryo back to the pool but the pool already contains a Kryo, then we'll just discard the new Kryo. I don't think there's a clear benefit to having a larger pool since our usages tend to fall into two cases, a) where we only create a single OutputStream and b) where we create a huge number of OutputStreams with the same lifecycle, then destroy the KryoSerializerInstance (this is what's happening in the bypassMergeSort code path that my regression test hits).

Author: Josh Rosen <[email protected]>

Closes #6415 from JoshRosen/SPARK-7873 and squashes the following commits:

00b402e [Josh Rosen] Initialize eagerly to fix a failing test
ba55d20 [Josh Rosen] Add explanatory comments
3f1da96 [Josh Rosen] Guard against duplicate close()
ab457ca [Josh Rosen] Sketch a loan/release based solution.
9816e8f [Josh Rosen] Add a failing test showing how deserialize() and deserializeStream() can interfere.
7350886 [Josh Rosen] Add failing regression test for SPARK-7873

(cherry picked from commit 852f4de)
Signed-off-by: Patrick Wendell <[email protected]>
Author: Sandy Ryza <[email protected]>

Closes #6440 from sryza/sandy-spark-7896 and squashes the following commits:

49d8a0d [Sandy Ryza] Fix bug introduced when reading over record boundaries
6006856 [Sandy Ryza] Fix overflow issues
006b4b2 [Sandy Ryza] Fix scalastyle by removing non ascii characters
8b000ca [Sandy Ryza] Add ascii art to describe layout of data in metaBuffer
f2053c0 [Sandy Ryza] Fix negative overflow issue
0368c78 [Sandy Ryza] Initialize size as 0
a5a4820 [Sandy Ryza] Use explicit types for all numbers in ChainedBuffer
b7e0213 [Sandy Ryza] SPARK-7896. Allow ChainedBuffer to store more than 2 GB

(cherry picked from commit bd11b01)
Signed-off-by: Patrick Wendell <[email protected]>
This contribution is my original work and I license the work to the project under the project's open source license

Author: Matt Wise <[email protected]>

Closes #6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits:

e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration

(cherry picked from commit 3541061)
Signed-off-by: Reynold Xin <[email protected]>
Current behaviour::
In spark UI
![screen shot 2015-05-27 at 3 27 51 pm](https://cloud.githubusercontent.com/assets/3919211/7837541/47d330ba-04a5-11e5-89d1-e5b11da1a513.png)

In YARN
![screen shot 2015-05-27 at 3](https://cloud.githubusercontent.com/assets/3919211/7837594/aebd1d36-04a5-11e5-8216-86e03c07d2bd.png)

In jira
![screen shot 2015-05-27 at 3_2](https://cloud.githubusercontent.com/assets/3919211/7837616/d3fedce2-04a5-11e5-9e68-960ed54e5d83.png)

Author: zuxqoj <[email protected]>

Closes #6437 from zuxqoj/SPARK-7782_PR and squashes the following commits:

cd068b9 [zuxqoj] [SPARK-7782] fixed sort arrow issue

(cherry picked from commit e838a25)
Signed-off-by: Reynold Xin <[email protected]>
…10/src to src

Since `spark-streaming-kafka` now is published for both Scala 2.10 and 2.11, we can move `KafkaWordCount` and `DirectKafkaWordCount` from `examples/scala-2.10/src/` to `examples/src/` so that they will appear in `spark-examples-***-jar` for Scala 2.11.

Author: zsxwing <[email protected]>

Closes #6436 from zsxwing/SPARK-7895 and squashes the following commits:

c6052f1 [zsxwing] Update examples/pom.xml
0bcfa87 [zsxwing] Fix the sleep time
b9d1256 [zsxwing] Move Kafka examples from scala-2.10/src to src

(cherry picked from commit 000df2f)
Signed-off-by: Patrick Wendell <[email protected]>
…lize) being called multiple times

~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010

Author: Xiangrui Meng <[email protected]>

Closes #6442 from mengxr/SPARK-7903 and squashes the following commits:

c257d2a [Xiangrui Meng] add a workaround for VectorUDT

(cherry picked from commit 530efe3)
Signed-off-by: Xiangrui Meng <[email protected]>
Fix the bug that entering only 1 arg will cause array out of bounds exception in PageRank example.

Author: Li Yao <[email protected]>

Closes #6455 from lastland/patch-1 and squashes the following commits:

de06128 [Li Yao] Fix the bug that entering only 1 arg will cause array out of bounds exception.

(cherry picked from commit c771589)
Signed-off-by: Andrew Or <[email protected]>
…tion.

The location of the IDE setup information has changed, so this just updates the link on the Building Spark page.

Author: Mike Dusenberry <[email protected]>

Closes #6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and squashes the following commits:

75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building Spark documentation by pointing to new location.

(cherry picked from commit 3e312a5)
Signed-off-by: Sean Owen <[email protected]>
`VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases:

1. DoubleType with ML attribute: carry over
2. DoubleType without ML attribute: numeric value
3. Scalar type: numeric value
4. VectorType with all ML attributes: carry over and update names
5. VectorType with number of ML attributes: assume all numeric
6. VectorType without ML attributes: check the first row and get the number of attributes

jkbradley

Author: Xiangrui Meng <[email protected]>

Closes #6452 from mengxr/SPARK-7198 and squashes the following commits:

a9d2469 [Xiangrui Meng] add space
facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes

(cherry picked from commit 7859ab6)
Signed-off-by: Joseph K. Bradley <[email protected]>
See comments on #3913

Author: Reynold Xin <[email protected]>

Closes #6471 from rxin/sizeestimator and squashes the following commits:

c057095 [Reynold Xin] Fixed import.
2da478b [Reynold Xin] Remove SizeEstimator from o.a.spark package.

(cherry picked from commit 0077af2)
Signed-off-by: Reynold Xin <[email protected]>
https://issues.apache.org/jira/browse/SPARK-7853

This fixes the problem introduced by my change in #6435, which causes that Hive Context fails to create in spark shell because of the class loader issue.

Author: Yin Huai <[email protected]>

Closes #6459 from yhuai/SPARK-7853 and squashes the following commits:

37ad33e [Yin Huai] Do not use hiveQlTable at all.
47cdb6d [Yin Huai] Move hiveconf.set to the end of setConf.
005649b [Yin Huai] Update comment.
35d86f3 [Yin Huai] Access TTable directly to make sure Hive will not internally use any metastore utility functions.
3737766 [Yin Huai] Recursively find all jars.

(cherry picked from commit 572b62c)
Signed-off-by: Yin Huai <[email protected]>
CC jkbradley

Author: Xusen Yin <[email protected]>

Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits:

e2dc32e [Xusen Yin] rename colums
e350e49 [Xusen Yin] add all demos
006ddf1 [Xusen Yin] add java test
3238481 [Xusen Yin] add bucketizer

(cherry picked from commit 1bd63e8)
Signed-off-by: Joseph K. Bradley <[email protected]>
So we can enable a whitespace enforcement rule in the style checker to save code review time.

Author: Reynold Xin <[email protected]>

Closes #6475 from rxin/whitespace-streaming and squashes the following commits:

810dae4 [Reynold Xin] Fixed tests.
89068ad [Reynold Xin] [SPARK-7927] whitespace fixes for streaming.

(cherry picked from commit 3af0b31)
Signed-off-by: Reynold Xin <[email protected]>
So we can enable a whitespace enforcement rule in the style checker to save code review time.

Author: Reynold Xin <[email protected]>

Closes #6478 from rxin/whitespace-hive and squashes the following commits:

e01b0e0 [Reynold Xin] Fixed tests.
a3bba22 [Reynold Xin] [SPARK-7927] whitespace fixes for Hive and ThriftServer.

(cherry picked from commit ee6a0e1)
Signed-off-by: Reynold Xin <[email protected]>
Looks like this was added by accident when pwendell merged a commit back in September: fe2b1d6

Author: Kay Ousterhout <[email protected]>

Closes #6485 from kayousterhout/SPARK-7933 and squashes the following commits:

7c6164a [Kay Ousterhout] [SPARK-7933] Remove Patrick's username/pw from merge script

(cherry picked from commit 66c49ed)
Signed-off-by: Patrick Wendell <[email protected]>
rxin

Author: Xiangrui Meng <[email protected]>

Closes #6481 from mengxr/mllib-scalastyle and squashes the following commits:

3ca4d61 [Xiangrui Meng] revert scalastyle config
30961ba [Xiangrui Meng] adjust spaces in mllib/test
571b5c5 [Xiangrui Meng] fix spaces in mllib

(cherry picked from commit 04616b1)
Signed-off-by: Reynold Xin <[email protected]>
So we can enable a whitespace enforcement rule in the style checker to save code review time.

Author: Reynold Xin <[email protected]>

Closes #6477 from rxin/whitespace-sql-core and squashes the following commits:

ce6e369 [Reynold Xin] Fixed tests.
6095fed [Reynold Xin] [SPARK-7927] whitespace fixes for SQL core.

(cherry picked from commit ff44c71)
Signed-off-by: Reynold Xin <[email protected]>
Author: Reynold Xin <[email protected]>

Closes #6480 from rxin/whitespace-example and squashes the following commits:

8a4a3d4 [Reynold Xin] [SPARK-7929] Remove Bagel examples & whitespace fix for examples.

(cherry picked from commit 2881d14)
Signed-off-by: Reynold Xin <[email protected]>
So we can enable a whitespace enforcement rule in the style checker to save code review time.

Author: Reynold Xin <[email protected]>

Closes #6476 from rxin/whitespace-catalyst and squashes the following commits:

650409d [Reynold Xin] Fixed tests.
51a9e5d [Reynold Xin] [SPARK-7927] whitespace fixes for Catalyst module.

(cherry picked from commit 8da560d)
Signed-off-by: Reynold Xin <[email protected]>

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
So we can enable a whitespace enforcement rule in the style checker to save code review time.

Author: Reynold Xin <[email protected]>

Closes #6473 from rxin/whitespace-core and squashes the following commits:

058195d [Reynold Xin] Fixed tests.
fce11e9 [Reynold Xin] [SPARK-7927] whitespace fixes for core.

(cherry picked from commit 7f7505d)
Signed-off-by: Reynold Xin <[email protected]>
So we can enable a whitespace enforcement rule in the style checker to save code review time.

Author: Reynold Xin <[email protected]>

Closes #6474 from rxin/whitespace-graphx and squashes the following commits:

4d3cd26 [Reynold Xin] Fixed tests.
869dde4 [Reynold Xin] [SPARK-7927] whitespace fixes for GraphX.

(cherry picked from commit b069ad2)
Signed-off-by: Reynold Xin <[email protected]>
Switch to the official Pyrolite release from the one published under `org.spark-project`. Thanks irmen for making the releases on Maven Central. We didn't upgrade to 4.6 because we don't have enough time for QA. I excludes `serpent` from its dependencies because we don't use it in Spark.
~~~
[info]   +-net.jpountz.lz4:lz4:1.3.0
[info]   +-net.razorvine:pyrolite:4.4
[info]   +-net.sf.py4j:py4j:0.8.2.1
~~~

davies

Author: Xiangrui Meng <[email protected]>

Closes #6472 from mengxr/SPARK-7926 and squashes the following commits:

7b3c6bf [Xiangrui Meng] use the official Pyrolite release

(cherry picked from commit c45d58c)
Signed-off-by: Xiangrui Meng <[email protected]>
`make clean html` under `python/doc` returns
~~~
/Users/meng/src/spark/python/pyspark/ml/evaluation.py:docstring of pyspark.ml.evaluation.RegressionEvaluator.setParams:3: WARNING: Definition list ends without a blank line; unexpected unindent.
~~~

harsha2010

Author: Xiangrui Meng <[email protected]>

Closes #6469 from mengxr/fix-regression-evaluator-doc and squashes the following commits:

91e2dad [Xiangrui Meng] fix RegressionEvaluator doc

(cherry picked from commit 834e699)
Signed-off-by: Xiangrui Meng <[email protected]>
The existing code rounds down to the nearest percent when computing the proportion
of a task's time that was spent on each phase of execution, and then computes
the scheduler delay proportion as 100 - sum(all other proportions).  As a result,
a few extra percent can end up in the scheduler delay. This commit eliminates
the rounding so that the time visualizations correspond properly to the real times.

sarutak If you could take a look at this, that would be great! Not sure if there's a good
reason to round here that I missed.

cc shivaram

Author: Kay Ousterhout <[email protected]>

Closes #6484 from kayousterhout/SPARK-7932 and squashes the following commits:

1723cc4 [Kay Ousterhout] [SPARK-7932] Fix misleading scheduler delay visualization

(cherry picked from commit 04ddcd4)
Signed-off-by: Kay Ousterhout <[email protected]>
Shutdown hook for temp directories had priority 100 while SparkContext was 50. So the local root directory was deleted before SparkContext was shutdown. This leads to scary errors on running jobs, at the time of shutdown. This is especially a problem when running streaming examples, where Ctrl-C is the only way to shutdown.

The fix in this PR is to make the temp directory shutdown priority lower than SparkContext, so that the temp dirs are the last thing to get deleted, after the SparkContext has been shut down. Also, the DiskBlockManager shutdown priority is change from default 100 to temp_dir_prio + 1, so that it gets invoked just before all temp dirs are cleared.

Author: Tathagata Das <[email protected]>

Closes #6482 from tdas/SPARK-7930 and squashes the following commits:

d7cbeb5 [Tathagata Das] Removed unnecessary line
1514d0b [Tathagata Das] Fixed shutdown hook priorities

(cherry picked from commit cd3d9a5)
Signed-off-by: Patrick Wendell <[email protected]>
Expose user/item factors in DataFrames. This is to be more consistent with the pipeline API. It also helps maintain consistent APIs across languages. This PR also removed fitting params from `ALSModel`.

coderxiang

Author: Xiangrui Meng <[email protected]>

Closes #6468 from mengxr/SPARK-7922 and squashes the following commits:

7bfb1d5 [Xiangrui Meng] update ALSModel in PySpark
1ba5607 [Xiangrui Meng] use DataFrames for user/item factors in ALS

(cherry picked from commit db95137)
Signed-off-by: Xiangrui Meng <[email protected]>
Attempts to restart the socket receiver when it is supposed to be stopped causes undesirable error messages.

Author: Tathagata Das <[email protected]>

Closes #6483 from tdas/SPARK-7931 and squashes the following commits:

09aeee1 [Tathagata Das] Do not restart receiver when stopped
koeninger and others added 24 commits June 19, 2015 17:36
Author: cody koeninger <[email protected]>

Closes #6863 from koeninger/SPARK-8390 and squashes the following commits:

26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390
3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs
b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing
bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup
3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
…uator to get correct cross validation

JIRA: https://issues.apache.org/jira/browse/SPARK-8468

Author: Liang-Chi Hsieh <[email protected]>

Closes #6905 from viirya/cv_min and squashes the following commits:

930d3db [Liang-Chi Hsieh] Fix python unit test and add document.
d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min
16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal.
c3dd8d9 [Liang-Chi Hsieh] For comments.
b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.

(cherry picked from commit 0b89951)
Signed-off-by: Joseph K. Bradley <[email protected]>
The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
```
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
```
This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file

Author: jeanlyn <[email protected]>

Closes #6833 from jeanlyn/speculation and squashes the following commits:

64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
8860af0 [jeanlyn] remove the never using code
e19a3bd [jeanlyn] avoid speculative tasks write same file

(cherry picked from commit a1e3649)
Signed-off-by: Cheng Lian <[email protected]>
…elease 1.4

Reorganized docs a bit.  Added migration guides.

**Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```?  It would be a lot.

CC: mengxr

Author: Joseph K. Bradley <[email protected]>

Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits:

4bf26d6 [Joseph K. Bradley] tiny fix
8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR
6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4

(cherry picked from commit a189442)
Signed-off-by: Xiangrui Meng <[email protected]>
…lidatorSuite

Ref. #6905
ping yhuai

Author: Liang-Chi Hsieh <[email protected]>

Closes #6929 from viirya/hot_fix_cv_test and squashes the following commits:

b1aec53 [Liang-Chi Hsieh] Hotfix branch-1.4 by removing avgMetrics in CrossValidatorSuite.
Author: Cheng Lian <[email protected]>

Closes #6932 from liancheng/spark-8406-for-1.4 and squashes the following commits:

a0168fe [Cheng Lian] Backports SPARK-8406 and PR #6864 to branch-1.4
…branch-1.4)

This is branch 1.4 backport of #6888.

Below is the original description.

In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`.  This allowed comparing a timestamp with a partial date as a user would expect.
 - `time > "2014-06-10"`
 - `time > "2014"`

In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.

This PR restores the earlier behavior.  Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.

Author: Michael Armbrust <michaeldatabricks.com>

Closes #6888 from marmbrus/timeCompareString and squashes the following commits:

bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

Author: Michael Armbrust <[email protected]>

Closes #6914 from yhuai/timeCompareString-1.4 and squashes the following commits:

9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings
…ession.py`

[[SPARK-8511] Modify a test to remove a saved model in `regression.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8511)

Author: Yu ISHIKAWA <[email protected]>

Closes #6926 from yu-iskw/SPARK-8511 and squashes the following commits:

7cd0948 [Yu ISHIKAWA] Use `shutil.rmtree()` to temporary directories for saving model testings, instead of `os.removedirs()`
4a01c9e [Yu ISHIKAWA] [SPARK-8511][pyspark] Modify a test to remove a saved model in `regression.py`

(cherry picked from commit 5d89d9f)
Signed-off-by: Joseph K. Bradley <[email protected]>

Conflicts:
	python/pyspark/mllib/tests.py
…/parquet/jdbc always override mode

https://issues.apache.org/jira/browse/SPARK-8532

This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.

Author: Yin Huai <[email protected]>

Closes #6937 from yhuai/SPARK-8532 and squashes the following commits:

f972d5d [Yin Huai] davies's comment.
d37abd2 [Yin Huai] style.
d21290a [Yin Huai] Python doc.
889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
d696dff [Yin Huai] Python style.
88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
c40c461 [Yin Huai] Regression test.

(cherry picked from commit 5ab9fcf)
Signed-off-by: Yin Huai <[email protected]>
… unit test under jdk8

To reproduce that:
```
JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive  'test-only org.apache.spark.sql.hive.execution.HiveWindowFunctionQueryWithoutCodeGenSuite'
```

A simple workaround to fix that is update the original query, for getting the output size instead of the exact elements of the array (output by collect_set())

Author: Cheng Hao <[email protected]>

Closes #6402 from chenghao-intel/windowing and squashes the following commits:

99312ad [Cheng Hao] add order by for the select clause
edf8ce3 [Cheng Hao] update the code as suggested
7062da7 [Cheng Hao] fix the collect_set() behaviour differences under different versions of JDK

(cherry picked from commit 13321e6)
Signed-off-by: Yin Huai <[email protected]>
… files

[[SPARK-8548] Remove the trailing whitespaces from the SparkR files - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8548)

- This is the result of `lint-r`
    https://gist.github.com/yu-iskw/0019b37a2c1167f33986

Author: Yu ISHIKAWA <[email protected]>

Closes #6945 from yu-iskw/SPARK-8548 and squashes the following commits:

0bd567a [Yu ISHIKAWA] [SPARK-8548][SparkR] Remove the trailing whitespaces from the SparkR files

(cherry picked from commit 44fa7df)
Signed-off-by: Shivaram Venkataraman <[email protected]>
…ax bins

Author: Holden Karau <[email protected]>

Closes #6331 from holdenk/SPARK-7781-GradientBoostedTrees.trainRegressor-missing-max-bins and squashes the following commits:

2894695 [Holden Karau] remove extra blank line
2573e8d [Holden Karau] Update the scala side of the pythonmllibapi and make the test a bit nicer too
3a09170 [Holden Karau] add maxBins to to the train method as well
af7f274 [Holden Karau] Add maxBins to GradientBoostedTrees.trainRegressor and correctly mention the default of 32 in other places where it mentioned 100

(cherry picked from commit 164fe2a)
Signed-off-by: Joseph K. Bradley <[email protected]>
A minor change but one which is (presumably) visible on the public api docs webpage.

Author: Scott Taylor <[email protected]>

Closes #6942 from megatron-me-uk/patch-3 and squashes the following commits:

fbed000 [Scott Taylor] test the absolute error in approx doctests

(cherry picked from commit f0dcbe8)
Signed-off-by: Josh Rosen <[email protected]>
Author: Hari Shreedharan <[email protected]>

Closes #6910 from harishreedharan/remove-commons-lang3 and squashes the following commits:

9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0
ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0
…e writer

Author: Holden Karau <[email protected]>

Closes #6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits:

f807832 [Holden Karau] Log error if we can't throw it
855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates.
039d620 [Holden Karau] Add missing closeandwriteoutput
30e558d [Holden Karau] go back to try/finally
e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception
ae0b7a7 [Holden Karau] Fix the test
2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions

(cherry picked from commit 0f92be5)
Signed-off-by: Josh Rosen <[email protected]>
…feshuffle writer"

This reverts commit 3348245.

Reverting because `catch (Exception e) ... throw e` doesn't compile under
Java 6 unless the method declares that it throws Exception.
the syntax was incorrect in the example in explode

Author: lockwobr <[email protected]>

Closes #6943 from lockwobr/master and squashes the following commits:

3d864d1 [lockwobr] updated the documentation for explode

(cherry picked from commit 4f7fbef)
Signed-off-by: Kousuke Saruta <[email protected]>
…ce between label and features vector

fix LabeledPoint parser when there is a whitespace between label and features vector, e.g.
(y, [x1, x2, x3])

Author: Oleksiy Dyagilev <[email protected]>

Closes #6954 from fe2s/SPARK-8525 and squashes the following commits:

0755b9d [Oleksiy Dyagilev] [SPARK-8525][MLLIB] addressing comment, removing dep on commons-lang
c1abc2b [Oleksiy Dyagilev] [SPARK-8525][MLLIB] fix LabeledPoint parser when there is a whitespace on specific position

(cherry picked from commit a803118)
Signed-off-by: Xiangrui Meng <[email protected]>
…s used in booelan expression

It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead.

Author: Davies Liu <[email protected]>

Closes #6961 from davies/column_bool and squashes the following commits:

9f19beb [Davies Liu] update message
af74bd6 [Davies Liu] fix tests
07dff84 [Davies Liu] address comments, fix tests
f70c08e [Davies Liu] raise Exception if column is used in booelan expression

(cherry picked from commit 7fb5ae5)
Signed-off-by: Davies Liu <[email protected]>
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala
@AmplabJenkins
Copy link

Build triggered.

@yhuai yhuai closed this Jun 24, 2015
@AmplabJenkins
Copy link

Build started.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35619 has started for PR 6965 at commit 8a4684b.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35619 has finished for PR 6965 at commit 8a4684b.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Build finished. Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.