conf/spark-env.sh
export HADOOP_CONF_DIR=/home/ocdc/hadoop-2.3.0-cdh5.0.0-och3.1.0/etc/hadoop export SPARK_YARN_APP_JAR=/home/ocdc/spark_0.9.1_streaming/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar export SPARK_JAR=/home/ocdc/spark_0.9.1_streaming/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.3.0-cdh5.0.0.jar
conf/Sample.xml
................
com.asiainfo.ocdc.streaming.impl.KafkaSource
Specifies the ZooKeeper connection string in the form hostname:port,
where hostname and port are the host and port for a node in your ZooKeeper cluster.
To allow connecting through other ZooKeeper nodes when that host is down
you can also specify multiple hosts in the form
hostname1:port1,hostname2:port2,hostname3:port3
cmbb3
topicProducername
test-consumer-group
3
a,b,c,d,e,f,count,fee
com.asiainfo.ocdc.streaming.impl.StreamFilter
t1
cell
lac,cell
b,c,t1.cell,count,fee
t1.cell!=null
com.asiainfo.ocdc.streaming.impl.DynamicOperate
t2
b
Count,Fee
t2.Count+count,t2.Fee+fee
b,c,t1.cell,count,fee
com.asiainfo.ocdc.streaming.impl.KafkaOut
topicName
topic ConsumerName
dev001:9092
The port the socket server listens on,
hostname1:port1,hostname2:port2,hostname3:port3
b,c
......
mvn package
The command format is as follows,To start the streaming app application
./bin/start-streaming-app.sh streaming-app-name 2 conf/Sample.xml
Parameter 1, execute the script file Parameter 2, When should the different configuration XML file, the corresponding streamingappname Parameter 3, flow interval refresh time(seconds) Parameter 4, The configuration file
Test documentation, reference spark_dev project wiki