streaming-app requires:
Scala 2.10.3
Hbase 0.98.1
kafka_2.10-0.8.1
conf/spark-env.sh
export HADOOP_CONF_DIR=/home/ocdc/hadoop-2.3.0-cdh5.0.0-och3.1.0/etc/hadoop export SPARK_YARN_APP_JAR=/home/ocdc/spark_dev/target/spark-dev-V00B01C00-SNAPSHOT-jar-with-dependencies.jar export SPARK_JAR=/home/ocdc/spark-assembly-0.9.1-hadoop2.3.0-cdh5.0.0.jar
conf/Sample.xml
Sample.xml is used for processing flow, used for filtering rules, data judgment conditions, judging and dynamic accumulation factor
................
com.asiainfo.ocdc.streaming.impl.KafkaSource
Specifies the ZooKeeper connection string in the form hostname:port,
where hostname and port are the host and port for a node in your ZooKeeper cluster.
To allow connecting through other ZooKeeper nodes when that host is down
you can also specify multiple hosts in the form
hostname1:port1,hostname2:port2,hostname3:port3
cmbb3
topicProducername
test-consumer-group
3
a,b,c,d,e,f,count,fee
com.asiainfo.ocdc.streaming.impl.StreamFilter
t1
cell
lac,cell
b,c,t1.cell,count,fee
t1.cell!=null
com.asiainfo.ocdc.streaming.impl.DynamicOperate
t2
b
Count,Fee
t2.Count+count,t2.Fee+fee
b,c,t1.cell,count,fee
com.asiainfo.ocdc.streaming.impl.KafkaOut
topicName
topic ConsumerName
dev001:9092
The port the socket server listens on,
hostname1:port1,hostname2:port2,hostname3:port3
b,c
......
mvn package
The command format is as follows,To start the streaming app application
./bin/start-streaming-app.sh <streaming-app-name> <time> <file>
Parameter 1, When should the different configuration XML file, the corresponding streamingappname
Parameter 2, flow interval refresh time(seconds)
Parameter 3, The configuration file (as:conf/Sample.xml)
Test documentation, reference streaming-app project wiki https://github.com/asiainfo/streaming-app/wiki