Skip to content

ailkgithub/spark_dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Asiainfo Spark_dev

Configuration

conf/spark-env.sh

  export HADOOP_CONF_DIR=/home/ocdc/hadoop-2.3.0-cdh5.0.0-och3.1.0/etc/hadoop
  export SPARK_YARN_APP_JAR=/home/ocdc/spark_0.9.1_streaming/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar
  export SPARK_JAR=/home/ocdc/spark_0.9.1_streaming/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.3.0-cdh5.0.0.jar
  

conf/Sample.xml

  ................
  
     
        com.asiainfo.ocdc.streaming.impl.KafkaSource
        
        Specifies the ZooKeeper connection string in the form hostname:port,
        where hostname and port are the host and port for a node in your ZooKeeper cluster.
        To allow connecting through other ZooKeeper nodes when that host is down
        you can also specify multiple hosts in the form 
        hostname1:port1,hostname2:port2,hostname3:port3
        cmbb3
        topicProducername
        test-consumer-group
        3
         
        a,b,c,d,e,f,count,fee
    

    
        com.asiainfo.ocdc.streaming.impl.StreamFilter
        t1
        cell
        lac,cell
        b,c,t1.cell,count,fee
        t1.cell!=null
    

    
        com.asiainfo.ocdc.streaming.impl.DynamicOperate
        t2
        b
        Count,Fee
        t2.Count+count,t2.Fee+fee
        b,c,t1.cell,count,fee
    

    
        com.asiainfo.ocdc.streaming.impl.KafkaOut
        topicName
        topic ConsumerName
        dev001:9092
        The port the socket server listens on,
        hostname1:port1,hostname2:port2,hostname3:port3
        b,c
    
    ......
  

Building Spark

  mvn package
  

Start spark streaming

The command format is as follows,To start the streaming app application

./bin/start-streaming-app.sh streaming-app-name 2 conf/Sample.xml

Parameter 1, execute the script file Parameter 2, When should the different configuration XML file, the corresponding streamingappname Parameter 3, flow interval refresh time(seconds) Parameter 4, The configuration file

Running Tests

Test documentation, reference spark_dev project wiki

About

spark_dev

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7