You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spark Streaming can receive streaming data from any arbitrary data source beyond
7
-
the one's for which it has in-built support (that is, beyond Flume, Kafka, Kinesis, files, sockets, etc.).
7
+
the ones for which it has built-in support (that is, beyond Flume, Kafka, Kinesis, files, sockets, etc.).
8
8
This requires the developer to implement a *receiver* that is customized for receiving data from
9
9
the concerned data source. This guide walks through the process of implementing a custom receiver
10
10
and using it in a Spark Streaming application. Note that custom receivers can be implemented
@@ -21,15 +21,15 @@ A custom receiver must extend this abstract class by implementing two methods
21
21
-`onStop()`: Things to do to stop receiving data.
22
22
23
23
Both `onStart()` and `onStop()` must not block indefinitely. Typically, `onStart()` would start the threads
24
-
that responsible for receiving the data and `onStop()` would ensure that the receiving by those threads
24
+
that are responsible for receiving the data, and `onStop()` would ensure that these threads receiving the data
25
25
are stopped. The receiving threads can also use `isStopped()`, a `Receiver` method, to check whether they
26
26
should stop receiving data.
27
27
28
28
Once the data is received, that data can be stored inside Spark
29
29
by calling `store(data)`, which is a method provided by the Receiver class.
30
-
There are number of flavours of `store()` which allow you store the received data
31
-
record-at-a-time or as whole collection of objects / serialized bytes. Note that the flavour of
32
-
`store()` used to implemented a receiver affects its reliability and fault-tolerance semantics.
30
+
There are a number of flavors of `store()` which allow one to store the received data
31
+
record-at-a-time or as whole collection of objects / serialized bytes. Note that the flavor of
32
+
`store()` used to implement a receiver affects its reliability and fault-tolerance semantics.
33
33
This is discussed [later](#receiver-reliability) in more detail.
34
34
35
35
Any exception in the receiving threads should be caught and handled properly to avoid silent
@@ -60,7 +60,7 @@ class CustomReceiver(host: String, port: Int)
60
60
61
61
def onStop() {
62
62
// There is nothing much to do as the thread calling receive()
63
-
// is designed to stop by itself isStopped() returns false
63
+
// is designed to stop by itself if isStopped() returns false
64
64
}
65
65
66
66
/** Create a socket connection and receive data until receiver is stopped */
@@ -123,7 +123,7 @@ public class JavaCustomReceiver extends Receiver<String> {
123
123
124
124
public void onStop() {
125
125
// There is nothing much to do as the thread calling receive()
126
-
// is designed to stop by itself isStopped() returns false
126
+
// is designed to stop by itself if isStopped() returns false
127
127
}
128
128
129
129
/** Create a socket connection and receive data until receiver is stopped */
@@ -167,7 +167,7 @@ public class JavaCustomReceiver extends Receiver<String> {
167
167
168
168
The custom receiver can be used in a Spark Streaming application by using
169
169
`streamingContext.receiverStream(<instance of custom receiver>)`. This will create
170
-
input DStream using data received by the instance of custom receiver, as shown below
170
+
an input DStream using data received by the instance of custom receiver, as shown below:
171
171
172
172
<divclass="codetabs">
173
173
<divdata-lang="scala"markdown="1" >
@@ -206,22 +206,20 @@ there are two kinds of receivers based on their reliability and fault-tolerance
206
206
and stored in Spark reliably (that is, replicated successfully). Usually,
207
207
implementing this receiver involves careful consideration of the semantics of source
208
208
acknowledgements.
209
-
1.*Unreliable Receiver* - These are receivers for unreliable sources that do not support
210
-
acknowledging. Even for reliable sources, one may implement an unreliable receiver that
211
-
do not go into the complexity of acknowledging correctly.
209
+
1.*Unreliable Receiver* - An *unreliable receiver* does *not* send acknowledgement to a source. This can be used for sources that do not support acknowledgement, or even for reliable sources when one does not want or need to go into the complexity of acknowledgement.
212
210
213
211
To implement a *reliable receiver*, you have to use `store(multiple-records)` to store data.
214
-
This flavour of `store` is a blocking call which returns only after all the given records have
212
+
This flavor of `store` is a blocking call which returns only after all the given records have
215
213
been stored inside Spark. If the receiver's configured storage level uses replication
216
214
(enabled by default), then this call returns after replication has completed.
217
215
Thus it ensures that the data is reliably stored, and the receiver can now acknowledge the
218
-
source appropriately. This ensures that no data is caused when the receiver fails in the middle
216
+
source appropriately. This ensures that no data is lost when the receiver fails in the middle
219
217
of replicating data -- the buffered data will not be acknowledged and hence will be later resent
220
218
by the source.
221
219
222
220
An *unreliable receiver* does not have to implement any of this logic. It can simply receive
223
221
records from the source and insert them one-at-a-time using `store(single-record)`. While it does
224
-
not get the reliability guarantees of `store(multiple-records)`, it has the following advantages.
222
+
not get the reliability guarantees of `store(multiple-records)`, it has the following advantages:
225
223
226
224
- The system takes care of chunking that data into appropriate sized blocks (look for block
227
225
interval in the [Spark Streaming Programming Guide](streaming-programming-guide.html)).
0 commit comments