[SPARK-1260]: faster construction of features with intercept #161

mengxr · 2014-03-17T18:50:37Z

The current implementation uses Array(1.0, features: _*) to construct a new array with intercept. This is not efficient for big arrays because Array.apply uses a for loop that iterates over the arguments. Array.+: is a better choice here.

Also, I don't see a reason to set initial weights to ones. So I set them to zeros.

JIRA: https://spark-project.atlassian.net/browse/SPARK-1260

AmplabJenkins · 2014-03-17T18:51:00Z

Merged build triggered.

AmplabJenkins · 2014-03-17T18:51:00Z

Merged build started.

AmplabJenkins · 2014-03-17T19:43:47Z

Merged build finished.

AmplabJenkins · 2014-03-17T19:43:47Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13213/

rxin · 2014-03-18T22:14:17Z

Thanks. I've merged this!

The current implementation uses `Array(1.0, features: _*)` to construct a new array with intercept. This is not efficient for big arrays because `Array.apply` uses a for loop that iterates over the arguments. `Array.+:` is a better choice here. Also, I don't see a reason to set initial weights to ones. So I set them to zeros. JIRA: https://spark-project.atlassian.net/browse/SPARK-1260 Author: Xiangrui Meng <[email protected]> Closes apache#161 from mengxr/sgd and squashes the following commits: b5cfc53 [Xiangrui Meng] set default weights to zeros a1439c2 [Xiangrui Meng] faster construction of features with intercept

## What changes were proposed in this pull request? This PR adds a new project `ql-kafka-0-8` to support Kafka 0.8 for Structured Streaming. It follows the design of Kafka 0.10 source except: - Don't support `subscribePattern`. Because without the 0.10 Kafka APIs, we need to ask all topics from Zookeeper and filter topics by ourselves. - Don't support `failOnDataLoss` option. It means that the user cannot delete topics, otherwise the query will fail. In addition, comparing to DStream Kafka 0.8 source, it has the following addition feature: - Support discovering new partitions of a topic if the user uses `subscribe` option. ## How was this patch tested? (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shixiong Zhu <[email protected]> Closes apache#161 from zsxwing/kafka08.

## What changes were proposed in this pull request? A follow up PR for apache#161 to disallow unsupported options. ## How was this patch tested? `test("unsupported options")` Author: Shixiong Zhu <[email protected]> Closes apache#169 from zsxwing/kafka08-errors.

* Allow setting memory on the driver submission server. * Address comments * Address comments (cherry picked from commit f6823f3)

* Allow setting memory on the driver submission server. * Address comments * Address comments

YSPARK-713: Made changes to spark-env-gen.sh to resolve keystore and truststore url on QE cluster

…emporary disconnection between driver and Mesos master. (apache#161)

Do refactor for Ansible jobs to keep struct consistent

* Revert "release r49 (apache#162)" This reverts commit 62da28f. * Revert "release r48 (apache#161)" This reverts commit 1441531. * revert ae release r52 * Revert "release r48 (apache#161)" This reverts commit 1441531. Co-authored-by: 7mming7 <[email protected]>

* Revert "release r49 (apache#162)" This reverts commit 62da28f. * Revert "release r48 (apache#161)" This reverts commit 1441531. * revert ae skew release r51 Co-authored-by: 7mming7 <[email protected]>

apache#161)

…33] Backport insert operation lock (apache#197) * [HADP-40184]Backport insert operation lock (#15) [HADP-31946] Fix data duplicate on application retry and support concurrent write to different partitions in the same table.[HADP-33040][HADP-33041] Optimize merging staging files to output path and detect conflict with HDFS file lease. HADP-34738] During commitJob, merge paths with multi threads (apache#218) [HADP-36251] Enhance the concurrent lock mechanism for insert operation (apache#272) [HADP-37137] Add option to disable insert operation lock to write partitioned table (apache#286) * [HADP-46224] Do not overwrite the lock file when creating lock (apache#133) * [HADP-46868] Fix Spark merge path race condition (apache#161) * [HADP-50903] Ignore the error message if insert operation lock file has been deleted (apache#271) * [HADP-50733] Enhance the error message on picking insert operation lock failure (apache#267) * Fix * Fix * Fix * fix * Fix * Fix * Fix * Fix * Fix * [HADP-50574] Support to create the lock file for EC enabled path (apache#263) * [HADP-50574][FOLLOWUP] Add parameter type when getting overwrite method (apache#265) * [HADP-50574][FOLLOWUP] Add UT for creating ec disabled lock file and use underlying DistributedFileSystem for ViewFileSystem (apache#266) * Fix * Fix * Fix * [HADP-34612][FOLLOWUP] Do not show the insert local error by removing the being written stream from dfs client (apache#288) * Enabled Hadoop 3 --------- Co-authored-by: fwang12 <[email protected]>

mengxr added 2 commits March 17, 2014 10:23

faster construction of features with intercept

a1439c2

set default weights to zeros

b5cfc53

asfgit closed this in e108b9a Mar 18, 2014

mengxr deleted the sgd branch March 18, 2014 22:40

ash211 referenced this pull request in palantir/spark Mar 3, 2017

Allow setting memory on the driver submission server. (#161)

af92539

* Allow setting memory on the driver submission server. * Address comments * Address comments (cherry picked from commit f6823f3)

lins05 pushed a commit to lins05/spark that referenced this pull request Apr 23, 2017

Allow setting memory on the driver submission server. (apache#161)

015f18d

* Allow setting memory on the driver submission server. * Address comments * Address comments

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Allow setting memory on the driver submission server. (apache#161)

bd3deca

* Allow setting memory on the driver submission server. * Address comments * Address comments

yoonlee95 pushed a commit to yoonlee95/spark that referenced this pull request Aug 17, 2017

Merge pull request apache#161 from pgandhi/yspark_2_1_1

564c9fb

YSPARK-713: Made changes to spark-env-gen.sh to resolve keystore and truststore url on QE cluster

jlopezmalla pushed a commit to jlopezmalla/spark that referenced this pull request Feb 27, 2018

[AT] Update BDT version (apache#161)

f7dc1df

Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018

[SPARK-478] Make driver failover_timeout configurable, to allow for t…

aa6783f

…emporary disconnection between driver and Mesos master. (apache#161)

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#161 from theopenlab/small-refactor

6082d9d

Do refactor for Ansible jobs to keep struct consistent

microbearz pushed a commit to microbearz/spark that referenced this pull request Dec 15, 2020

release r48 (apache#161)

1441531

turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025

[CARMEL-7484] Fix flaky test in ContextCleanerSuite and BroadcastSuite (

b6b76e1

apache#161)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-1260]: faster construction of features with intercept #161

[SPARK-1260]: faster construction of features with intercept #161

Uh oh!

mengxr commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

rxin commented Mar 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-1260]: faster construction of features with intercept #161

[SPARK-1260]: faster construction of features with intercept #161

Uh oh!

Conversation

mengxr commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

AmplabJenkins commented Mar 17, 2014

Uh oh!

rxin commented Mar 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants