Skip to content

Conversation

@markhamstra
Copy link

No description provided.

Davies Liu and others added 15 commits September 28, 2015 14:40
The UTF8String may come from UnsafeRow, then underline buffer of it is not copied, so we should clone it in order to hold it in Stats.

cc yhuai

Author: Davies Liu <[email protected]>

Closes apache#8929 from davies/pushdown_string.

(cherry picked from commit ea02e55)
Signed-off-by: Yin Huai <[email protected]>
In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree.

The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that.

Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way.

The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing.

Author: Sean Owen <[email protected]>

Closes apache#8919 from srowen/SPARK-10833.

(cherry picked from commit bf4199e)
Signed-off-by: Sean Owen <[email protected]>
…AllocationSuite

Fix the following issues in StandaloneDynamicAllocationSuite:

1. It should not assume master and workers start in order
2. It should not assume master and workers get ready at once
3. It should not assume the application is already registered with master after creating SparkContext
4. It should not access Master.app and idToApp which are not thread safe

The changes includes:
* Use `eventually` to wait until master and workers are ready to fix 1 and 2
* Use `eventually`  to wait until the application is registered with master to fix 3
* Use `askWithRetry[MasterStateResponse](RequestMasterState)` to get the application info to fix 4

Author: zsxwing <[email protected]>

Closes apache#8914 from zsxwing/fix-StandaloneDynamicAllocationSuite.

(cherry picked from commit dba95ea)
Signed-off-by: Andrew Or <[email protected]>
Author: Ryan Williams <[email protected]>

Closes apache#8939 from ryan-williams/errmsg.

(cherry picked from commit b7ad54e)
Signed-off-by: Andrew Or <[email protected]>
…Suite

Fixed the test failure here: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/

This failure is because `HeartbeatReceiverSuite. heartbeatReceiver` may receive `SparkListenerExecutorAdded("driver")` sent from [LocalBackend](https://github.com/apache/spark/blob/8fb3a65cbb714120d612e58ef9d12b0521a83260/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala#L121).

There are other race conditions in `HeartbeatReceiverSuite` because `HeartbeatReceiver.onExecutorAdded` and `HeartbeatReceiver.onExecutorRemoved` are asynchronous. This PR also fixed them.

Author: zsxwing <[email protected]>

Closes apache#8946 from zsxwing/SPARK-10058.

(cherry picked from commit 9b3e776)
Signed-off-by: Marcelo Vanzin <[email protected]>
The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with.

Author: felixcheung <[email protected]>

Closes apache#8961 from felixcheung/rselect.

(cherry picked from commit 721e8b5)
Signed-off-by: Shivaram Venkataraman <[email protected]>
I don't believe the API changed at all.

Author: Avrohom Katz <[email protected]>

Closes apache#8957 from akatz/kcl-upgrade.

(cherry picked from commit 883bd8f)
Signed-off-by: Sean Owen <[email protected]>
`Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not.

Author: Wenchen Fan <[email protected]>

Closes apache#8987 from cloud-fan/hash.
This should go into 1.5.2 also.

The issue is we were no longer adding the __app__.jar to the system classpath.

Author: Thomas Graves <[email protected]>
Author: Tom Graves <[email protected]>

Closes apache#8959 from tgravescs/SPARK-10901.

(cherry picked from commit e978360)
Signed-off-by: Marcelo Vanzin <[email protected]>
This PR implements the following features for both `master` and `branch-1.5`.
1. Display the failed output op count in the batch list
2. Display the failure reason of output op in the batch detail page

Screenshots:
<img width="1356" alt="1" src="https://cloud.githubusercontent.com/assets/1000778/10198387/5b2b97ec-67ce-11e5-81c2-f818b9d2f3ad.png">
<img width="1356" alt="2" src="https://cloud.githubusercontent.com/assets/1000778/10198388/5b76ac14-67ce-11e5-8c8b-de2683c5b485.png">

There are still two remaining problems in the UI.
1. If an output operation doesn't run any spark job, we cannot get the its duration since now it's the sum of all jobs' durations.
2. If an output operation doesn't run any spark job, we cannot get the description since it's the latest job's call site.

We need to add new `StreamingListenerEvent` about output operations to fix them. So I'd like to fix them only for `master` in another PR.

Author: zsxwing <[email protected]>

Closes apache#8950 from zsxwing/batch-failure.

(cherry picked from commit ffe6831)
Signed-off-by: Tathagata Das <[email protected]>
Currently if it isn't set it scans `/lib/*` and adds every dir to the
classpath which makes the env too large and every command called
afterwords fails.

Author: Kevin Cox <[email protected]>

Closes apache#8994 from kevincox/kevincox-only-add-hive-to-classpath-if-var-is-set.
The created decimal is wrong if using `Decimal(unscaled, precision, scale)` with unscaled > 1e18 and and precision > 18 and scale > 0.

This bug exists since the beginning.

Author: Davies Liu <[email protected]>

Closes apache#9014 from davies/fix_decimal.

(cherry picked from commit 37526ac)
Signed-off-by: Davies Liu <[email protected]>
…ifferent Oops size.

UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).

To reproduce, launch Spark using

MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"

And then run the following

scala> sql("select 1 xx").collect()

Author: Reynold Xin <[email protected]>

Closes apache#9030 from rxin/SPARK-10914.

(cherry picked from commit 84ea287)
Signed-off-by: Reynold Xin <[email protected]>
…eaming applications

Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled.

Author: Hari Shreedharan <[email protected]>

Closes apache#8998 from harishreedharan/ss-log-error and squashes the following commits:

462b264 [Hari Shreedharan] Improve log message.
2733d94 [Hari Shreedharan] Minor change to warning message.
eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled.
725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it.
b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled.
a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications.

(cherry picked from commit 0984129)
Signed-off-by: Tathagata Das <[email protected]>
BryanCutler and others added 3 commits October 8, 2015 22:23
…rain with given regParam and convergenceTol parameters

These params were being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training.  Same with StreamingLinearRegressionWithSGD.  I added the params as named arguments to the call and also fixed the intercept parameter, which was being passed as regularization value.

Author: Bryan Cutler <[email protected]>

Closes apache#9002 from BryanCutler/StreamingSGD-convergenceTol-bug-10959.

(cherry picked from commit 5410747)
Signed-off-by: Xiangrui Meng <[email protected]>
…n on Aggregate

For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL.

Author: Wenchen Fan <[email protected]>

Closes apache#8548 from cloud-fan/support-order-by-non-attribute.
@yeweizhang
Copy link

Can we also pull this fix?

https://issues.apache.org/jira/browse/SPARK-10389

This will fix the 100+ failure we ran into when comparing the native and sparkSQL resutls. Thank you.

@markhamstra
Copy link
Author

Already did.

markhamstra added a commit that referenced this pull request Oct 9, 2015
@markhamstra markhamstra merged commit ce28740 into alteryx:csd-1.5 Oct 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.