Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Jul 2, 2015

No description provided.

@rxin
Copy link
Contributor Author

rxin commented Jul 2, 2015

This is based on #7174

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36327 has started for PR 7175 at commit daa849b.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36327 has finished for PR 7175 at commit daa849b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpectsInputTypes
    • abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
    • abstract class BinaryOperator extends BinaryExpression
    • abstract class BinaryArithmetic extends BinaryOperator
    • case class Md5(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Sha1(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Crc32(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Not(child: Expression) extends UnaryExpression with Predicate with ExpectsInputTypes
    • abstract class BinaryComparison extends BinaryOperator with Predicate
    • trait StringRegexExpression extends ExpectsInputTypes
    • trait CaseConversionExpression extends ExpectsInputTypes
    • trait StringComparison extends ExpectsInputTypes
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • abstract class DataType extends AbstractDataType

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@rxin rxin changed the title [SPARK-8772][SQL] Implement implicit type cast for expressions that defines input types. [SPARK-8772][SQL] Implement implicit type cast for expressions that define input types. Jul 2, 2015
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36332 has started for PR 7175 at commit efbfa42.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36336 has started for PR 7175 at commit f0ff97f.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the rule seems like to be:
DecimalType => DecimalType
FractionalType => DoubleType
LongType => LongType
IntegralType => IntegerType

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36332 has finished for PR 7175 at commit efbfa42.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpectsInputTypes
    • abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
    • abstract class BinaryOperator extends BinaryExpression
    • abstract class BinaryArithmetic extends BinaryOperator
    • case class Md5(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Sha1(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Crc32(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Not(child: Expression) extends UnaryExpression with Predicate with ExpectsInputTypes
    • abstract class BinaryComparison extends BinaryOperator with Predicate
    • trait StringRegexExpression extends ExpectsInputTypes
    • trait CaseConversionExpression extends ExpectsInputTypes
    • trait StringComparison extends ExpectsInputTypes
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • protected[sql] abstract class AtomicType extends DataType
    • abstract class NumericType extends AtomicType
    • abstract class DataType extends AbstractDataType

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will we use this method?

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36336 has finished for PR 7175 at commit f0ff97f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpectsInputTypes
    • abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
    • abstract class BinaryOperator extends BinaryExpression
    • abstract class BinaryArithmetic extends BinaryOperator
    • case class Md5(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Sha1(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Crc32(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Not(child: Expression) extends UnaryExpression with Predicate with ExpectsInputTypes
    • abstract class BinaryComparison extends BinaryOperator with Predicate
    • trait StringRegexExpression extends ExpectsInputTypes
    • trait CaseConversionExpression extends ExpectsInputTypes
    • trait StringComparison extends ExpectsInputTypes
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • protected[sql] abstract class AtomicType extends DataType
    • abstract class NumericType extends AtomicType
    • abstract class DataType extends AbstractDataType

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crc32 should be able to work with StringType, but StringType cannot be implicit casted BinaryType, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need to think about whether we should support implicit casts from string to binary. sql server does support that. hive doesn't, but hive chose to make a lot of the udfs work against both types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we can always cast a string to binary correctly, as it produces different binary when specifying different encoder. It's actually the case accept multiple DataType for an expression.
And also for Length, which support both StringType and BinaryType.

We probably need another PR for this improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about having an AbstractDataType that's a TypeCollection, that expressions can put arbitrary types into it. Basically similar to the Seq[Any] idea, but with better type safety.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea for this, but it probably make thing more complicated for auto casting. (Which data type should be cast to?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for StringType -> BinaryType (UTF8 will be used)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we'd better leave the casting (StringType -> BinaryType) to be done within the UDF Crc32 itself, not via the generic auto casting rule. From the user perspective, the UDF Crc32 will support both StringType and BinaryType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I just checked the code of Hive, it does convert the StringType => BinaryType (UTF8 bytes), just as the generic rule. @davies +1

@AmplabJenkins
Copy link

Merged build triggered.

@rxin
Copy link
Contributor Author

rxin commented Jul 2, 2015

I added an implicit type cast from String to Binary.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36375 has started for PR 7175 at commit 88080a2.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36375 has finished for PR 7175 at commit 88080a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Md5(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Sha1(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Crc32(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • case class Not(child: Expression) extends UnaryExpression with Predicate with ExpectsInputTypes
    • trait StringRegexExpression extends ExpectsInputTypes
    • trait CaseConversionExpression extends ExpectsInputTypes
    • trait StringComparison extends ExpectsInputTypes
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • protected[sql] abstract class AtomicType extends DataType
    • abstract class NumericType extends AtomicType
    • abstract class DataType extends AbstractDataType

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to hive and discussion in #6551,
should we only allow atomic type(except boolean and binary) to string?

@cloud-fan
Copy link
Contributor

lgtm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exist: why we add unapply for it? Is it same with Cast(child: NumericType, StringType)? It looks to me that we only need this object NumericType in ExpectsInputTypes when an expression need any kind of numeric input.
And, should we add object AtomicType too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

child is an expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry didn't see that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this is old code. just got copied around.

@rxin
Copy link
Contributor Author

rxin commented Jul 2, 2015

@marmbrus i'm going to merge this one since it blocks @davies' pr. I will submit a follow up pr to support type collections for inputs. Please continue to review and I will address feedback together.

@asfgit asfgit closed this in 52508be Jul 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants