Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
95348c1
Added sql/hbase module for 0.96.1.1-hadoop2
javadba Aug 15, 2014
47a2040
Fixed assembly for hbase
javadba Aug 22, 2014
8df54c3
Skeleton HBaseSQLContext working
javadba Aug 22, 2014
93c0fc8
incremental HBaseStrategies updates
javadba Aug 30, 2014
c72b036
Removed spurious manual merge error in pom.xml
sboeschhuawei Sep 2, 2014
6bb1e04
add HBaseSQLParser
Sep 2, 2014
fd63a33
Added files missed to checkin
sboeschhuawei Sep 2, 2014
8935714
change the variable names
Sep 4, 2014
0b23c56
add delete table method
Sep 4, 2014
910a5c0
add CREATE and DROP
Sep 4, 2014
acc4075
add ALTER and modify other part
Sep 5, 2014
530b034
Fix build/compile errors
sboeschhuawei Sep 6, 2014
ba7a14f
Fix build/compile errors
sboeschhuawei Sep 8, 2014
3fd298d
Another small step for stephen-kind along the path to an HBase Logica…
sboeschhuawei Sep 10, 2014
1f3d63d
Updated hbase project to 1.2.0 and fixed build issues
sboeschhuawei Sep 11, 2014
551ae16
Fixed assembly for hbase
javadba Aug 22, 2014
311fa39
small modify
Sep 12, 2014
cd1a215
add test case
Sep 15, 2014
576e98b
add create table support
Sep 15, 2014
2dcfc8d
Fix the problem that fails to get the keyword from HbaseSqlParser
Sep 16, 2014
1b91819
add logic plan
Sep 17, 2014
868e52b
Add more components for creating table. Remain issues in Analyzer and…
Sep 17, 2014
7e8b1bf
revise the code to use Scala list
Sep 18, 2014
7c14ed3
revise the code to use Scala list
Sep 18, 2014
12da966
Change the input parameter to catalog
Sep 19, 2014
80fe09d
Fix the remaining issue of analyzing the createTable logic plan
Sep 20, 2014
1a892f6
Add some comments
Sep 20, 2014
3b21472
Incremental updates before impl of HBaseRDD
sboeschhuawei Sep 22, 2014
468eba5
Incremental updates before impl of HBaseRDD
sboeschhuawei Sep 22, 2014
e945fbc
revise the code to use Scala list
Sep 22, 2014
30a8926
fix the compilation error
Sep 22, 2014
881ac64
Modify the code
Sep 23, 2014
c6e578d
change the string type to more meaningful type
Sep 25, 2014
0dc9e5e
fix the compilation error
Sep 25, 2014
488425b
Change hadoop default version; Change the input parameter to 'CreateT…
Sep 25, 2014
50064d2
Optimize the package imported
Sep 25, 2014
cebccac
Implemented basic end-to-end for HBase query
sboeschhuawei Sep 25, 2014
a40810a
Move some functions to correct files
Sep 25, 2014
48a8969
fix the error
Sep 26, 2014
e6dcc5b
Added in-memory multi Region Server HBase unit testing
sboeschhuawei Sep 28, 2014
f2338b2
Added in-memory multi Region Server HBase unit testing
sboeschhuawei Sep 28, 2014
8c22e09
Added in-memory multi Region Server HBase unit testing
sboeschhuawei Sep 29, 2014
3383d03
Updates to HBaseCatalog interface for columns
sboeschhuawei Sep 29, 2014
e3d87e0
fix the issues in create/get table method
Sep 30, 2014
4e46202
Optimize the workflow of creating table
Sep 30, 2014
efd2eef
Fix the code style
Sep 30, 2014
1309263
Fix the compilation error
Sep 30, 2014
0a69806
Small tweaks to HBaseStrategies
sboeschhuawei Sep 30, 2014
3bd2ef2
Small tweaks to HBaseStrategies
sboeschhuawei Sep 30, 2014
7ec3c0e
Logging fix and order of hbase/catalyst table tweak
sboeschhuawei Sep 30, 2014
07e8a2f
add namespace support
Sep 30, 2014
3f88fb0
Add namespce to Create Table syntax
Sep 30, 2014
4f16c36
Added InsertIntoHBase and updated RowKey logic
sboeschhuawei Oct 2, 2014
c4f3c21
Added InsertIntoHBase and updated RowKey logic
sboeschhuawei Oct 2, 2014
a97d132
add check exists functions
Oct 2, 2014
b557fc3
Fixed CreateTable testcase problem and updated RowKeyParser
sboeschhuawei Oct 2, 2014
401a64c
Fixed CreateTable testcase problem and updated RowKeyParser
sboeschhuawei Oct 2, 2014
7106713
add type conversion function
Oct 2, 2014
93af29f
Removed LogicalPlan and SchemaRDD from PhysicalPlans
sboeschhuawei Oct 2, 2014
5e792ad
Working through HBase Snappy issues and HBaseSQLParser resolution issue
sboeschhuawei Oct 3, 2014
a9c22ff
Add content to test
Oct 3, 2014
4321d7e
Additional work on partitioning
sboeschhuawei Oct 3, 2014
bb8dc68
Removed LogicalPlan and SchemaRDD from PhysicalPlans
sboeschhuawei Oct 3, 2014
b46140a
Modify the workflow of InsertIntoTable
Oct 3, 2014
093e164
Incremental query testing
sboeschhuawei Oct 4, 2014
de26141
Working through issues with Catalog integration
sboeschhuawei Oct 5, 2014
33fe712
Fixed Catalog bugs: namespace mixup (partial fix), RowKey in wrong or…
sboeschhuawei Oct 6, 2014
b768186
Disabled pred pushdown and able to reach ReaderRDD
sboeschhuawei Oct 6, 2014
d54fa22
Change the syntax of CreateTable
Oct 6, 2014
6871100
fix the namespace issues
Oct 6, 2014
ff9714e
add delete table function
Oct 7, 2014
96d0290
Add Drop
Oct 7, 2014
6d58edc
Fixed conn issues in HBaseSQLReaderRDD
sboeschhuawei Oct 9, 2014
d8444f7
Fixed conn issues in HBaseSQLReaderRDD
sboeschhuawei Oct 9, 2014
e959502
use catalyst data type instead of hbase data type
Oct 9, 2014
6b60109
remove hbase data type
Oct 9, 2014
a5fd662
Change the input to catalyst datatype
Oct 9, 2014
aee3401
RowKey and HBaseSQLReaderRDD fixes
sboeschhuawei Oct 9, 2014
bbec48e
fix the code style issue
Oct 10, 2014
c89bf27
Add verification to Hbase CreateTable
Oct 10, 2014
ecdfcb4
Basic query working
sboeschhuawei Oct 13, 2014
a6cbd95
Fixed conn issues in HBaseSQLReaderRDD
sboeschhuawei Oct 13, 2014
4913a8e
add logical table exist check
Oct 13, 2014
37b4387
fix the data type issue
Oct 13, 2014
af30223
Fixed RowKeyParser write path - used by InsertIntoTable
sboeschhuawei Oct 13, 2014
57bf401
Modify the verification and add HBaseAnalyzer for future development
Oct 13, 2014
eadc2b5
Ignore integration tests requiring external hbase access
sboeschhuawei Oct 13, 2014
407e97d
create hbase table required for testing
Oct 13, 2014
d8acba2
add more data type tests
Oct 14, 2014
4d82fe1
code formatting
Oct 14, 2014
59a4414
Fixed select * path but order is incorrect
sboeschhuawei Oct 15, 2014
45f799c
Fixed conn issues in HBaseSQLReaderRDD
sboeschhuawei Oct 15, 2014
823b91c
Small test tweaks for preds
sboeschhuawei Oct 15, 2014
f3afe35
Refactored according to Yan's designs
sboeschhuawei Oct 18, 2014
5d4df1a
Refactored according to Yan's designs
sboeschhuawei Oct 18, 2014
5dae756
Removed unused/unnecessary classes and code
sboeschhuawei Oct 18, 2014
c120105
mv hbase files to the new old dir
Oct 20, 2014
7f2f032
code format and minor fix
scwf Oct 20, 2014
04385a7
update with apache master and fix confilict
scwf Oct 20, 2014
3cd96cf
scala style fix
scwf Oct 20, 2014
a10f270
revert some diffs with apache master
scwf Oct 21, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,16 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>hbase</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hbase_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>spark-ganglia-lgpl</id>
<dependencies>
Expand Down
2 changes: 2 additions & 0 deletions bin/compute-classpath.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ set SPARK_CLASSES=%SPARK_CLASSES%;%FWDIR%tools\target\scala-%SCALA_VERSION%\clas
set SPARK_CLASSES=%SPARK_CLASSES%;%FWDIR%sql\catalyst\target\scala-%SCALA_VERSION%\classes
set SPARK_CLASSES=%SPARK_CLASSES%;%FWDIR%sql\core\target\scala-%SCALA_VERSION%\classes
set SPARK_CLASSES=%SPARK_CLASSES%;%FWDIR%sql\hive\target\scala-%SCALA_VERSION%\classes
set SPARK_CLASSES=%SPARK_CLASSES%;%FWDIR%sql\hbase\target\scala-%SCALA_VERSION%\classes

set SPARK_TEST_CLASSES=%FWDIR%core\target\scala-%SCALA_VERSION%\test-classes
set SPARK_TEST_CLASSES=%SPARK_TEST_CLASSES%;%FWDIR%repl\target\scala-%SCALA_VERSION%\test-classes
Expand All @@ -91,6 +92,7 @@ set SPARK_TEST_CLASSES=%SPARK_TEST_CLASSES%;%FWDIR%streaming\target\scala-%SCALA
set SPARK_TEST_CLASSES=%SPARK_TEST_CLASSES%;%FWDIR%sql\catalyst\target\scala-%SCALA_VERSION%\test-classes
set SPARK_TEST_CLASSES=%SPARK_TEST_CLASSES%;%FWDIR%sql\core\target\scala-%SCALA_VERSION%\test-classes
set SPARK_TEST_CLASSES=%SPARK_TEST_CLASSES%;%FWDIR%sql\hive\target\scala-%SCALA_VERSION%\test-classes
set SPARK_TEST_CLASSES=%SPARK_TEST_CLASSES%;%FWDIR%sql\hbase\target\scala-%SCALA_VERSION%\test-classes

if "x%SPARK_TESTING%"=="x1" (
rem Add test clases to path - note, add SPARK_CLASSES and SPARK_TEST_CLASSES before CLASSPATH
Expand Down
2 changes: 2 additions & 0 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ if [ -n "$SPARK_PREPEND_CLASSES" ]; then
CLASSPATH="$CLASSPATH:$FWDIR/tools/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hbase/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive-thriftserver/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/yarn/stable/target/scala-$SCALA_VERSION/classes"
Expand Down Expand Up @@ -131,6 +132,7 @@ if [[ $SPARK_TESTING == 1 ]]; then
CLASSPATH="$CLASSPATH:$FWDIR/streaming/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hbase/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/test-classes"
fi

Expand Down
18 changes: 18 additions & 0 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,24 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
<execution>
<id>test-jar-on-test-compile</id>
<phase>test-compile</phase>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>

<resources>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,13 @@ private[spark] class JavaSerializationStream(out: OutputStream, counterReset: In
* the stream 'resets' object class descriptions have to be re-written)
*/
def writeObject[T: ClassTag](t: T): SerializationStream = {
objOut.writeObject(t)
try {
objOut.writeObject(t)
} catch {
case e : Exception =>
System.err.println(s"serializable err on $t of type ${t.getClass.getName}")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary?

e.printStackTrace
}
counter += 1
if (counterReset > 0 && counter >= counterReset) {
objOut.reset()
Expand Down
86 changes: 84 additions & 2 deletions examples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,88 @@
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
<exclusion>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
<exclusion>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
</exclusion>
<exclusion>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty</artifactId>
</exclusion>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
<exclusion>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
Expand Down Expand Up @@ -219,7 +300,8 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-install-plugin</artifactId>
<configuration>
<skip>true</skip>
<skip>false</skip>
<!--<skip>true</skip>-->
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

</configuration>
</plugin>
<plugin>
Expand Down
18 changes: 10 additions & 8 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@
<module>streaming</module>
<module>sql/catalyst</module>
<module>sql/core</module>
<module>sql/hbase</module>
<module>sql/hive</module>
<module>repl</module>
<module>assembly</module>
Expand Down Expand Up @@ -124,8 +125,8 @@
<hadoop.version>1.0.4</hadoop.version>
<protobuf.version>2.4.1</protobuf.version>
<yarn.version>${hadoop.version}</yarn.version>
<hbase.version>0.94.6</hbase.version>
<flume.version>1.4.0</flume.version>
<hbase.version>0.98.5-hadoop2</hbase.version>
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only support 0.98.5-hadoop2?

<zookeeper.version>3.4.5</zookeeper.version>
<hive.version>0.12.0-protobuf-2.5</hive.version>
<parquet.version>1.4.3</parquet.version>
Expand Down Expand Up @@ -855,13 +856,13 @@
<goal>testCompile</goal>
</goals>
</execution>
<execution>
<id>attach-scaladocs</id>
<phase>verify</phase>
<goals>
<goal>doc-jar</goal>
</goals>
</execution>
<!--<execution>-->
<!--<id>attach-scaladocs</id>-->
<!--<phase>verify</phase>-->
<!--<goals>-->
<!--<goal>doc-jar</goal>-->
<!--</goals>-->
<!--</execution>-->
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
Expand Down Expand Up @@ -902,6 +903,7 @@
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
<debug>true</debug>
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

<encoding>UTF-8</encoding>
<maxmem>1024m</maxmem>
<fork>true</fork>
Expand Down
54 changes: 54 additions & 0 deletions python/pyspark/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -1417,6 +1417,52 @@ def _get_hive_ctx(self):
return self._jvm.TestHiveContext(self._jsc.sc())



class HBaseContext(SQLContext):

"""A variant of Spark SQL that integrates with data stored in Hive.

Configuration for Hive is read from hive-site.xml on the classpath.
It supports running both SQL and HiveQL commands.
"""

def __init__(self, sparkContext, hbaseContext=None):
"""Create a new HiveContext.

@param sparkContext: The SparkContext to wrap.
@param hiveContext: An optional JVM Scala HiveContext. If set, we do not instatiate a new
HiveContext in the JVM, instead we make all calls to this object.
"""
SQLContext.__init__(self, sparkContext)

if hbaseContext:
self._scala_hbaseContext = hbaseContext

@property
def _ssql_ctx(self):
try:
if not hasattr(self, '_scala_HbaseContext'):
self._scala_HBaseContext = self._get_hbase_ctx()
return self._scala_HBaseContext
except Py4JError as e:
raise Exception("You must build Spark with Hbase. "
"Export 'SPARK_HBASE=true' and run "
"sbt/sbt assembly", e)

def _get_hbase_ctx(self):
return self._jvm.HBaseContext(self._jsc.sc())


def sql(self, hqlQuery):
"""
DEPRECATED: Use sql()
"""
warnings.warn("hiveql() is deprecated as the sql function now parses using HiveQL by" +
"default. The SQL dialect for parsing can be set using 'spark.sql.dialect'",
DeprecationWarning)
return HBaseSchemaRDD(self._ssql_ctx.sql(hqlQuery).toJavaSchemaRDD(), self)


def _create_row(fields, values):
row = Row(*values)
row.__FIELDS__ = fields
Expand Down Expand Up @@ -1801,6 +1847,14 @@ def _test():
if failure_count:
exit(-1)

class HBaseSchemaRDD(SchemaRDD):
def createTable(self, tableName, overwrite=False):
"""Inserts the contents of this SchemaRDD into the specified table.

Optionally overwriting any existing data.
"""
self._jschema_rdd.createTable(tableName, overwrite)


if __name__ == "__main__":
_test()
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql.catalyst

import java.lang.reflect.Method

import scala.language.implicitConversions

import org.apache.spark.sql.catalyst.analysis._
Expand Down Expand Up @@ -110,9 +112,9 @@ class SqlParser extends AbstractSparkSQLParser {
.getClass
.getMethods
.filter(_.getReturnType == classOf[Keyword])
.map(_.invoke(this).asInstanceOf[Keyword].str)

.map{_.invoke(this).asInstanceOf[Keyword].str}
override val lexical = new SqlLexical(reservedWords)
println(reservedWords)

protected def assignAliases(exprs: Seq[Expression]): Seq[NamedExpression] = {
exprs.zipWithIndex.map {
Expand Down
18 changes: 18 additions & 0 deletions sql/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,24 @@
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
<execution>
<id>test-jar-on-test-compile</id>
<phase>test-compile</phase>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package org.apache.spark.sql

import org.apache.spark.annotation.{DeveloperApi, Experimental}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.execution.LogicalRDD
Expand Down
Loading