Skip to content

Commit a000b5c

Browse files
techaddictmateiz
authored andcommitted
SPARK-1637: Clean up examples for 1.0
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <[email protected]> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <[email protected]> Closes #571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
1 parent 39b8b14 commit a000b5c

40 files changed

+69
-72
lines changed

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,11 @@ right version of Scala from [scala-lang.org](http://www.scala-lang.org/download/
2424

2525
# Running the Examples and Shell
2626

27-
Spark comes with several sample programs. Scala and Java examples are in the `examples` directory, and Python examples are in `python/examples`.
27+
Spark comes with several sample programs. Scala, Java and Python examples are in the `examples/src/main` directory.
2828
To run one of the Java or Scala sample programs, use `./bin/run-example <class> <params>` in the top-level Spark directory
2929
(the `bin/run-example` script sets up the appropriate paths and launches that program).
3030
For example, try `./bin/run-example org.apache.spark.examples.SparkPi local`.
31-
To run a Python sample program, use `./bin/pyspark <sample-program> <params>`. For example, try `./bin/pyspark ./python/examples/pi.py local`.
31+
To run a Python sample program, use `./bin/pyspark <sample-program> <params>`. For example, try `./bin/pyspark ./examples/src/main/python/pi.py local`.
3232

3333
Each example prints usage help when run with no parameters.
3434

docs/python-programming-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,9 +161,9 @@ some example applications.
161161

162162
# Where to Go from Here
163163

164-
PySpark also includes several sample programs in the [`python/examples` folder](https://github.com/apache/spark/tree/master/python/examples).
164+
PySpark also includes several sample programs in the [`examples/src/main/python` folder](https://github.com/apache/spark/tree/master/examples/src/main/python).
165165
You can run them by passing the files to `pyspark`; e.g.:
166166

167-
./bin/spark-submit python/examples/wordcount.py
167+
./bin/spark-submit examples/src/main/python/wordcount.py
168168

169169
Each program prints usage help when run without arguments.

docs/streaming-programming-guide.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ ssc.awaitTermination() // Wait for the computation to terminate
129129
{% endhighlight %}
130130

131131
The complete code can be found in the Spark Streaming example
132-
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala).
132+
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala).
133133
<br>
134134

135135
</div>
@@ -215,7 +215,7 @@ jssc.awaitTermination(); // Wait for the computation to terminate
215215
{% endhighlight %}
216216

217217
The complete code can be found in the Spark Streaming example
218-
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java).
218+
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java).
219219
<br>
220220

221221
</div>
@@ -234,12 +234,12 @@ Then, in a different terminal, you can start the example by using
234234
<div class="codetabs">
235235
<div data-lang="scala" markdown="1">
236236
{% highlight bash %}
237-
$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999
237+
$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999
238238
{% endhighlight %}
239239
</div>
240240
<div data-lang="java" markdown="1">
241241
{% highlight bash %}
242-
$ ./bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount local[2] localhost 9999
242+
$ ./bin/run-example org.apache.spark.examples.streaming.JavaNetworkWordCount local[2] localhost 9999
243243
{% endhighlight %}
244244
</div>
245245
</div>
@@ -268,7 +268,7 @@ hello world
268268
{% highlight bash %}
269269
# TERMINAL 2: RUNNING NetworkWordCount or JavaNetworkWordCount
270270

271-
$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999
271+
$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999
272272
...
273273
-------------------------------------------
274274
Time: 1357008430000 ms
@@ -609,7 +609,7 @@ JavaPairDStream<String, Integer> runningCounts = pairs.updateStateByKey(updateFu
609609
The update function will be called for each word, with `newValues` having a sequence of 1's (from
610610
the `(word, 1)` pairs) and the `runningCount` having the previous count. For the complete
611611
Scala code, take a look at the example
612-
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala).
612+
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala).
613613

614614
<h4>Transform Operation</h4>
615615

@@ -1135,7 +1135,7 @@ If the `checkpointDirectory` exists, then the context will be recreated from the
11351135
If the directory does not exist (i.e., running for the first time),
11361136
then the function `functionToCreateContext` will be called to create a new
11371137
context and set up the DStreams. See the Scala example
1138-
[RecoverableNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/RecoverableNetworkWordCount.scala).
1138+
[RecoverableNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala).
11391139
This example appends the word counts of network data into a file.
11401140

11411141
You can also explicitly create a `StreamingContext` from the checkpoint data and start the
@@ -1174,7 +1174,7 @@ If the `checkpointDirectory` exists, then the context will be recreated from the
11741174
If the directory does not exist (i.e., running for the first time),
11751175
then the function `contextFactory` will be called to create a new
11761176
context and set up the DStreams. See the Scala example
1177-
[JavaRecoverableWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/JavaRecoverableWordCount.scala)
1177+
[JavaRecoverableWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming/JavaRecoverableWordCount.scala)
11781178
(note that this example is missing in the 0.9 release, so you can test it using the master branch).
11791179
This example appends the word counts of network data into a file.
11801180

@@ -1374,7 +1374,6 @@ package and renamed for better clarity.
13741374
[ZeroMQUtils](api/java/org/apache/spark/streaming/zeromq/ZeroMQUtils.html), and
13751375
[MQTTUtils](api/java/org/apache/spark/streaming/mqtt/MQTTUtils.html)
13761376

1377-
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples)
1378-
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
1379-
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and
1380-
[video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.
1377+
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming)
1378+
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples/streaming)
1379+
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and [video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.

examples/src/main/java/org/apache/spark/mllib/examples/JavaALS.java renamed to examples/src/main/java/org/apache/spark/examples/mllib/JavaALS.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.spark.mllib.examples;
18+
package org.apache.spark.examples.mllib;
1919

2020
import org.apache.spark.api.java.JavaRDD;
2121
import org.apache.spark.api.java.JavaSparkContext;

examples/src/main/java/org/apache/spark/mllib/examples/JavaKMeans.java renamed to examples/src/main/java/org/apache/spark/examples/mllib/JavaKMeans.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.spark.mllib.examples;
18+
package org.apache.spark.examples.mllib;
1919

2020
import java.util.regex.Pattern;
2121

examples/src/main/java/org/apache/spark/mllib/examples/JavaLR.java renamed to examples/src/main/java/org/apache/spark/examples/mllib/JavaLR.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.spark.mllib.examples;
18+
package org.apache.spark.examples.mllib;
1919

2020
import java.util.regex.Pattern;
2121

examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java renamed to examples/src/main/java/org/apache/spark/examples/streaming/JavaFlumeEventCount.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,10 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.spark.streaming.examples;
18+
package org.apache.spark.examples.streaming;
1919

2020
import org.apache.spark.api.java.function.Function;
21+
import org.apache.spark.examples.streaming.StreamingExamples;
2122
import org.apache.spark.streaming.*;
2223
import org.apache.spark.streaming.api.java.*;
2324
import org.apache.spark.streaming.flume.FlumeUtils;

examples/src/main/java/org/apache/spark/streaming/examples/JavaKafkaWordCount.java renamed to examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.spark.streaming.examples;
18+
package org.apache.spark.examples.streaming;
1919

2020
import java.util.Map;
2121
import java.util.HashMap;
@@ -26,6 +26,7 @@
2626
import org.apache.spark.api.java.function.Function;
2727
import org.apache.spark.api.java.function.Function2;
2828
import org.apache.spark.api.java.function.PairFunction;
29+
import org.apache.spark.examples.streaming.StreamingExamples;
2930
import org.apache.spark.streaming.Duration;
3031
import org.apache.spark.streaming.api.java.JavaDStream;
3132
import org.apache.spark.streaming.api.java.JavaPairDStream;
@@ -44,7 +45,7 @@
4445
* <numThreads> is the number of threads the kafka consumer should use
4546
*
4647
* Example:
47-
* `./bin/run-example org.apache.spark.streaming.examples.JavaKafkaWordCount local[2] zoo01,zoo02,
48+
* `./bin/run-example org.apache.spark.examples.streaming.JavaKafkaWordCount local[2] zoo01,zoo02,
4849
* zoo03 my-consumer-group topic1,topic2 1`
4950
*/
5051

examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java renamed to examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,15 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.spark.streaming.examples;
18+
package org.apache.spark.examples.streaming;
1919

2020
import com.google.common.collect.Lists;
2121
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
2222
import scala.Tuple2;
2323
import org.apache.spark.api.java.function.FlatMapFunction;
2424
import org.apache.spark.api.java.function.Function2;
2525
import org.apache.spark.api.java.function.PairFunction;
26+
import org.apache.spark.examples.streaming.StreamingExamples;
2627
import org.apache.spark.streaming.Duration;
2728
import org.apache.spark.streaming.api.java.JavaDStream;
2829
import org.apache.spark.streaming.api.java.JavaPairDStream;
@@ -39,7 +40,7 @@
3940
* To run this on your local machine, you need to first run a Netcat server
4041
* `$ nc -lk 9999`
4142
* and then run the example
42-
* `$ ./run org.apache.spark.streaming.examples.JavaNetworkWordCount local[2] localhost 9999`
43+
* `$ ./run org.apache.spark.examples.streaming.JavaNetworkWordCount local[2] localhost 9999`
4344
*/
4445
public final class JavaNetworkWordCount {
4546
private static final Pattern SPACE = Pattern.compile(" ");

examples/src/main/java/org/apache/spark/streaming/examples/JavaQueueStream.java renamed to examples/src/main/java/org/apache/spark/examples/streaming/JavaQueueStream.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,14 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.spark.streaming.examples;
18+
package org.apache.spark.examples.streaming;
1919

2020
import com.google.common.collect.Lists;
2121
import scala.Tuple2;
2222
import org.apache.spark.api.java.JavaRDD;
2323
import org.apache.spark.api.java.function.Function2;
2424
import org.apache.spark.api.java.function.PairFunction;
25+
import org.apache.spark.examples.streaming.StreamingExamples;
2526
import org.apache.spark.streaming.Duration;
2627
import org.apache.spark.streaming.api.java.JavaDStream;
2728
import org.apache.spark.streaming.api.java.JavaPairDStream;

0 commit comments

Comments
 (0)