Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 34 additions & 6 deletions docs/streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,21 @@ do is as follows.

<div class="codetabs">
<div data-lang="scala" markdown="1" >
First, we import the names of the Spark Streaming classes, and some implicit
conversions from StreamingContext into our environment, to add useful methods to
other classes we need (like DStream).

First, we create a
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object,
which is the main entry point for all streaming
functionality. Besides Spark's configuration, we specify that any DStream will be processed
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the
main entry point for all streaming functionality.

{% highlight scala %}
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
{% endhighlight %}

Then we create a
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object.
Besides Spark's configuration, we specify that any DStream will be processed
in 1 second batches.

{% highlight scala %}
Expand Down Expand Up @@ -98,7 +108,7 @@ val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)

// Print a few of the counts to the console
wordCount.print()
wordCounts.print()
{% endhighlight %}

The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
Expand Down Expand Up @@ -178,7 +188,7 @@ JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey(
return i1 + i2;
}
});
wordCount.print(); // Print a few of the counts to the console
wordCounts.print(); // Print a few of the counts to the console
{% endhighlight %}

The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
Expand Down Expand Up @@ -262,6 +272,24 @@ Time: 1357008430000 ms
</td>
</table>

If you plan to run the Scala code for Spark Streaming-based use cases in the Spark
shell, you should start the shell with the SparkConfiguration pre-configured to
discard old batches periodically:

{% highlight bash %}
$ SPARK_JAVA_OPTS=-Dspark.cleaner.ttl=10000 bin/spark-shell
{% endhighlight %}

... and create your StreamingContext by wrapping the existing interactive shell
SparkContext object, `sc`:

{% highlight scala %}
val ssc = new StreamingContext(sc, Seconds(1))
{% endhighlight %}

When working with the shell, you may also need to send a `^D` to your netcat session
to force the pipeline to print the word counts to the console at the sink.

***************************************************************************************************

# Basics
Expand Down