Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Jul 16, 2014

We saw task serialization problems with large feature dimension, which could be avoid if we don't serialize data directly into task but use broadcast variables. This PR uses broadcast in both training and prediction and adds tests to make sure the task size is small.

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA tests have started for PR 1427. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16703/consoleFull

@rxin
Copy link
Contributor

rxin commented Jul 16, 2014

This is out of scope for the PR, but it would be great to have some auto-broadcast mechanism ...

@mengxr
Copy link
Contributor Author

mengxr commented Jul 16, 2014

Actually the JIRA is for auto-switching between direct serialization and broadcast. It would be nice to implement it in sc.broadcast instead of specific to MLlib.

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA results for PR 1427:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16703/consoleFull

@mengxr
Copy link
Contributor Author

mengxr commented Jul 16, 2014

The failed test is from streaming, which is irrelevant to the change in this PR.

Test Result (1 failure / +1)
org.apache.spark.streaming.NetworkReceiverSuite.block generator throttling

@mengxr
Copy link
Contributor Author

mengxr commented Jul 16, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA tests have started for PR 1427. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16713/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA results for PR 1427:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16713/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Jul 26, 2014

Do you guys want to merge this until we can see whether the RDD change goes into 1.1? Or wait for that? It does seem like a useful fix.

@rxin
Copy link
Contributor

rxin commented Jul 27, 2014

Merged in master. Thanks.

@rxin
Copy link
Contributor

rxin commented Jul 27, 2014

We can revert this if the broadcast change gets in.

@asfgit asfgit closed this in aaf2b73 Jul 27, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…y into task closure

We saw task serialization problems with large feature dimension, which could be avoid if we don't serialize data directly into task but use broadcast variables. This PR uses broadcast in both training and prediction and adds tests to make sure the task size is small.

Author: Xiangrui Meng <[email protected]>

Closes apache#1427 from mengxr/broadcast-new and squashes the following commits:

b9a1228 [Xiangrui Meng] style update
b97c184 [Xiangrui Meng] minimal change to LBFGS
9ebadcc [Xiangrui Meng] add task size test to RowMatrix
9427bf0 [Xiangrui Meng] add task size tests to linear methods
e0a5cf2 [Xiangrui Meng] add task size test to GD
28a8411 [Xiangrui Meng] add test for NaiveBayes
380778c [Xiangrui Meng] update KMeans test
bccab92 [Xiangrui Meng] add task size test to LBFGS
02103ba [Xiangrui Meng] remove print
e73d68e [Xiangrui Meng] update tests for k-means
174cb15 [Xiangrui Meng] use local-cluster for test with a small akka.frameSize
1928a5a [Xiangrui Meng] add test for KMeans task size
e00c2da [Xiangrui Meng] use broadcast in GD, KMeans
010d076 [Xiangrui Meng] modify NaiveBayesModel and GLM to use broadcast
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
This bundles Boson with Spark 3.2.0.

Tested via the following commands:
```
BOSON_CONF_DIR=$HOME/git/boson/conf $SPARK_HOME/bin/spark-shell \
  --conf spark.sql.extensions=com.apple.boson.BosonSparkSessionExtensions \
  --conf spark.sql.adaptive.forceApply=true
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants