Skip to content

Commit e605dd6

Browse files
committed
add LIBSVM loader
1 parent f639674 commit e605dd6

File tree

1 file changed

+38
-2
lines changed

1 file changed

+38
-2
lines changed

docs/mllib-data-types.md

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,8 +146,44 @@ neg = LabeledPoint(0.0, SparseVector(3, [0, 2], [1.0, 3.0]))
146146
</div>
147147
</div>
148148

149-
It is very common in practice to see sparse training data.
150-
MLlib supports reading
149+
***Sparse data***
150+
151+
It is very common in practice to have sparse training data.
152+
MLlib supports reading training examples stored in `LIBSVM` format,
153+
which is the default format used by [`LIBSVM`](http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and [`LIBLINEAR`](http://www.csie.ntu.edu.tw/~cjlin/liblinear/).
154+
It is a text format.
155+
Each line represents a labeled sparse feature vector using the following format:
156+
157+
~~~
158+
label index1:value1 index2:value2 ...
159+
~~~
160+
161+
where the indices are one-based and in ascending order.
162+
After loading, the feature indices are converted to zero-based.
163+
164+
<div class="codetabs">
165+
<div data-lang="scala" markdown="1">
166+
[`MLUtils.loadLibSVMData`](api/mllib/index.html#org.apache.spark.mllib.util.MLUtils$) reads training
167+
examples stored in LIBSVM format.
168+
169+
{% highlight scala %}
170+
import org.apache.spark.mllib.util.MLUtils
171+
172+
val training: RDD[LabeledPoint] = MLUtils.loadLibSVMData(sc, "hdfs://...")
173+
{% endhighlight %}
174+
</div>
175+
176+
<div data-lang="java" markdown="1">
177+
[`MLUtils.loadLibSVMData`](api/mllib/index.html#org.apache.spark.mllib.util.MLUtils$) reads training
178+
examples stored in LIBSVM format.
179+
180+
{% highlight java %}
181+
import org.apache.spark.mllib.util.MLUtils;
182+
183+
RDD[LabeledPoint] training = MLUtils.loadLibSVMData(sc, "hdfs://...")
184+
{% endhighlight %}
185+
</div>
186+
</div>
151187

152188
## Local matrix
153189

0 commit comments

Comments
 (0)