Skip to content

Commit 658a478

Browse files
ogeaglajkbradley
authored andcommitted
[SPARK-5726] [MLLIB] Elementwise (Hadamard) Vector Product Transformer
See https://issues.apache.org/jira/browse/SPARK-5726 Author: Octavian Geagla <[email protected]> Author: Joseph K. Bradley <[email protected]> Closes #4580 from ogeagla/spark-mllib-weighting and squashes the following commits: fac12ad [Octavian Geagla] [SPARK-5726] [MLLIB] Use new createTransformFunc. 90f7e39 [Joseph K. Bradley] small cleanups 4595165 [Octavian Geagla] [SPARK-5726] [MLLIB] Remove erroneous test case. ded3ac6 [Octavian Geagla] [SPARK-5726] [MLLIB] Pass style checks. 37d4705 [Octavian Geagla] [SPARK-5726] [MLLIB] Incorporated feedback. 1dffeee [Octavian Geagla] [SPARK-5726] [MLLIB] Pass style checks. e436896 [Octavian Geagla] [SPARK-5726] [MLLIB] Remove 'TF' from 'ElementwiseProductTF' cb520e6 [Octavian Geagla] [SPARK-5726] [MLLIB] Rename HadamardProduct to ElementwiseProduct 4922722 [Octavian Geagla] [SPARK-5726] [MLLIB] Hadamard Vector Product Transformer
1 parent 347a329 commit 658a478

File tree

4 files changed

+234
-0
lines changed

4 files changed

+234
-0
lines changed

docs/mllib-feature-extraction.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -477,3 +477,57 @@ sc.stop();
477477
</div>
478478
</div>
479479

480+
## ElementwiseProduct
481+
482+
ElementwiseProduct multiplies each input vector by a provided "weight" vector, using element-wise multiplication. In other words, it scales each column of the dataset by a scalar multiplier. This represents the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29) between the input vector, `v` and transforming vector, `w`, to yield a result vector.
483+
484+
`\[ \begin{pmatrix}
485+
v_1 \\
486+
\vdots \\
487+
v_N
488+
\end{pmatrix} \circ \begin{pmatrix}
489+
w_1 \\
490+
\vdots \\
491+
w_N
492+
\end{pmatrix}
493+
= \begin{pmatrix}
494+
v_1 w_1 \\
495+
\vdots \\
496+
v_N w_N
497+
\end{pmatrix}
498+
\]`
499+
500+
[`ElementwiseProduct`](api/scala/index.html#org.apache.spark.mllib.feature.ElementwiseProduct) has the following parameter in the constructor:
501+
502+
* `w`: the transforming vector.
503+
504+
`ElementwiseProduct` implements [`VectorTransformer`](api/scala/index.html#org.apache.spark.mllib.feature.VectorTransformer) which can apply the weighting on a `Vector` to produce a transformed `Vector` or on an `RDD[Vector]` to produce a transformed `RDD[Vector]`.
505+
506+
### Example
507+
508+
This example below demonstrates how to load a simple vectors file, extract a set of vectors, then transform those vectors using a transforming vector value.
509+
510+
511+
<div class="codetabs">
512+
<div data-lang="scala">
513+
{% highlight scala %}
514+
import org.apache.spark.SparkContext._
515+
import org.apache.spark.mllib.feature.ElementwiseProduct
516+
import org.apache.spark.mllib.linalg.Vectors
517+
518+
// Load and parse the data:
519+
val data = sc.textFile("data/mllib/kmeans_data.txt")
520+
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble)))
521+
522+
val transformingVector = Vectors.dense(0.0, 1.0, 2.0)
523+
val transformer = new ElementwiseProduct(transformingVector)
524+
525+
// Batch transform and per-row transform give the same results:
526+
val transformedData = transformer.transform(parsedData)
527+
val transformedData2 = parsedData.map(x => transformer.transform(x))
528+
529+
{% endhighlight %}
530+
</div>
531+
</div>
532+
533+
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.ml.feature
19+
20+
import org.apache.spark.annotation.AlphaComponent
21+
import org.apache.spark.ml.UnaryTransformer
22+
import org.apache.spark.ml.param.Param
23+
import org.apache.spark.mllib.feature
24+
import org.apache.spark.mllib.linalg.{Vector, VectorUDT}
25+
import org.apache.spark.sql.types.DataType
26+
27+
/**
28+
* :: AlphaComponent ::
29+
* Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a
30+
* provided "weight" vector. In other words, it scales each column of the dataset by a scalar
31+
* multiplier.
32+
*/
33+
@AlphaComponent
34+
class ElementwiseProduct extends UnaryTransformer[Vector, Vector, ElementwiseProduct] {
35+
36+
/**
37+
* the vector to multiply with input vectors
38+
* @group param
39+
*/
40+
val scalingVec: Param[Vector] = new Param(this, "scalingVector", "vector for hadamard product")
41+
42+
/** @group setParam */
43+
def setScalingVec(value: Vector): this.type = set(scalingVec, value)
44+
45+
/** @group getParam */
46+
def getScalingVec: Vector = getOrDefault(scalingVec)
47+
48+
override protected def createTransformFunc: Vector => Vector = {
49+
require(params.contains(scalingVec), s"transformation requires a weight vector")
50+
val elemScaler = new feature.ElementwiseProduct($(scalingVec))
51+
elemScaler.transform
52+
}
53+
54+
override protected def outputDataType: DataType = new VectorUDT()
55+
}
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.mllib.feature
19+
20+
import org.apache.spark.annotation.Experimental
21+
import org.apache.spark.mllib.linalg._
22+
23+
/**
24+
* :: Experimental ::
25+
* Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a
26+
* provided "weight" vector. In other words, it scales each column of the dataset by a scalar
27+
* multiplier.
28+
* @param scalingVector The values used to scale the reference vector's individual components.
29+
*/
30+
@Experimental
31+
class ElementwiseProduct(val scalingVector: Vector) extends VectorTransformer {
32+
33+
/**
34+
* Does the hadamard product transformation.
35+
*
36+
* @param vector vector to be transformed.
37+
* @return transformed vector.
38+
*/
39+
override def transform(vector: Vector): Vector = {
40+
require(vector.size == scalingVector.size,
41+
s"vector sizes do not match: Expected ${scalingVector.size} but found ${vector.size}")
42+
vector match {
43+
case dv: DenseVector =>
44+
val values: Array[Double] = dv.values.clone()
45+
val dim = scalingVector.size
46+
var i = 0
47+
while (i < dim) {
48+
values(i) *= scalingVector(i)
49+
i += 1
50+
}
51+
Vectors.dense(values)
52+
case SparseVector(size, indices, vs) =>
53+
val values = vs.clone()
54+
val dim = values.length
55+
var i = 0
56+
while (i < dim) {
57+
values(i) *= scalingVector(indices(i))
58+
i += 1
59+
}
60+
Vectors.sparse(size, indices, values)
61+
case v => throw new IllegalArgumentException("Does not support vector type " + v.getClass)
62+
}
63+
}
64+
}
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.mllib.feature
19+
20+
import org.scalatest.FunSuite
21+
22+
import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vectors}
23+
import org.apache.spark.mllib.util.MLlibTestSparkContext
24+
import org.apache.spark.mllib.util.TestingUtils._
25+
26+
class ElementwiseProductSuite extends FunSuite with MLlibTestSparkContext {
27+
28+
test("elementwise (hadamard) product should properly apply vector to dense data set") {
29+
val denseData = Array(
30+
Vectors.dense(1.0, 4.0, 1.9, -9.0)
31+
)
32+
val scalingVec = Vectors.dense(2.0, 0.5, 0.0, 0.25)
33+
val transformer = new ElementwiseProduct(scalingVec)
34+
val transformedData = transformer.transform(sc.makeRDD(denseData))
35+
val transformedVecs = transformedData.collect()
36+
val transformedVec = transformedVecs(0)
37+
val expectedVec = Vectors.dense(2.0, 2.0, 0.0, -2.25)
38+
assert(transformedVec ~== expectedVec absTol 1E-5,
39+
s"Expected transformed vector $expectedVec but found $transformedVec")
40+
}
41+
42+
test("elementwise (hadamard) product should properly apply vector to sparse data set") {
43+
val sparseData = Array(
44+
Vectors.sparse(3, Seq((1, -1.0), (2, -3.0)))
45+
)
46+
val dataRDD = sc.parallelize(sparseData, 3)
47+
val scalingVec = Vectors.dense(1.0, 0.0, 0.5)
48+
val transformer = new ElementwiseProduct(scalingVec)
49+
val data2 = sparseData.map(transformer.transform)
50+
val data2RDD = transformer.transform(dataRDD)
51+
52+
assert((sparseData, data2, data2RDD.collect()).zipped.forall {
53+
case (v1: DenseVector, v2: DenseVector, v3: DenseVector) => true
54+
case (v1: SparseVector, v2: SparseVector, v3: SparseVector) => true
55+
case _ => false
56+
}, "The vector type should be preserved after hadamard product")
57+
58+
assert((data2, data2RDD.collect()).zipped.forall((v1, v2) => v1 ~== v2 absTol 1E-5))
59+
assert(data2(0) ~== Vectors.sparse(3, Seq((1, 0.0), (2, -1.5))) absTol 1E-5)
60+
}
61+
}

0 commit comments

Comments
 (0)