Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Jul 23, 2020

What changes were proposed in this pull request?

This PR comes from the comment: #29085 (comment)

  • Extract common Script IOSchema ScriptTransformationIOSchema
  • avoid repeated judgement extract process output row method createOutputIteratorWithoutSerde && createOutputIteratorWithSerde
  • add default no serde IO schemas ScriptTransformationIOSchema.defaultIOSchema

Why are the changes needed?

Refactor code

Does this PR introduce any user-facing change?

NO

How was this patch tested?

NO

@AngersZhuuuu
Copy link
Contributor Author

FYI @maropu

case NullType => voidTypeInfo
case dt =>
throw new AnalysisException(
s"${dt.getClass.getSimpleName.replace("$", "")} cannot be converted to Hive TypeInfo")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to this reafactoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to this reafactoring?

So make this change as a seperated jira?

Copy link
Member

@maropu maropu Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a testing issue (2.): #29085 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a testing issue (2.): #29085 (comment)

OK, change in next pr(2)

@maropu
Copy link
Member

maropu commented Jul 23, 2020

FYI: This PR comes from the comment: #29085 (comment)

@maropu maropu changed the title [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec… [SPARK-32105][SQL][FOLLOWUP] Refactor current ScriptTransformationExec Jul 23, 2020
@maropu
Copy link
Member

maropu commented Jul 23, 2020

Could you file a new JIRA?

@maropu
Copy link
Member

maropu commented Jul 23, 2020

Anyone could check this? @cloud-fan @viirya @wangyum

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-32105][SQL][FOLLOWUP] Refactor current ScriptTransformationExec [SPARK-32403][SQL] ScriptTransformExec extract common method from process row to avoid repeated judgement Jul 23, 2020
@SparkQA
Copy link

SparkQA commented Jul 23, 2020

Test build #126382 has finished for PR 29199 at commit 9b1f28a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2020

Test build #126381 has finished for PR 29199 at commit a007fa1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2020

Test build #126378 has finished for PR 29199 at commit 8364f1f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-32403][SQL] ScriptTransformExec extract common method from process row to avoid repeated judgement [SPARK-32403][SQL] Refactor current ScriptTransformationExec Jul 23, 2020
@AngersZhuuuu
Copy link
Contributor Author

Could you file a new JIRA?

Done

@SparkQA
Copy link

SparkQA commented Jul 23, 2020

Test build #126393 has finished for PR 29199 at commit 01bf8fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

Any update?

@cloud-fan
Copy link
Contributor

add default no serde IO schemas ScriptTransformationIOSchema.defaultIOSchema

I think we will have a default native serde. So for now we just need a fake one which just uses cast to string?

@AngersZhuuuu
Copy link
Contributor Author

add default no serde IO schemas ScriptTransformationIOSchema.defaultIOSchema

I think we will have a default native serde. So for now we just need a fake one which just uses cast to string?

@alfozan will add their spark serde , and will have a default one #29085 (comment)

@AngersZhuuuu
Copy link
Contributor Author

add default no serde IO schemas ScriptTransformationIOSchema.defaultIOSchema

I think we will have a default native serde. So for now we just need a fake one which just uses cast to string?

For @maropu's advice. more detail is :

  1. refactor more code about ScriptTransformationExec (this pr)
  2. add more test case for current HiveScriptTransformationExec (since we have too less UT for transform and there are many small point need to be clear)
  3. Implement SparkScriptTransformationExec in sql/core and add more detail UT
  4. Alfozan will add spark serde used in there prod, and then will have a default spark serde
  5. I will add some small PR to fix some small point , such as 1. support array/map/struct in noserde mode. 2. schema less data different form hive 's behavior etc....

@AngersZhuuuu
Copy link
Contributor Author

It‘s ok to merge this and start to work on follow step ?


// This nullability is a performance optimization in order to avoid an Option.foreach() call
// inside of a loop
@Nullable val (inputSerde, inputSoi) = initInputSerDe(ioschema, input).getOrElse((null, null))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use @Nullable for local variables?

Copy link
Contributor Author

@AngersZhuuuu AngersZhuuuu Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use @Nullable for local variables?

Seems first author add this to remind the value could be null and add judgement at next code.
Seems a good coding style ? Sure it's ok to remove these, so just remove all @nullable?
cc @maropu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking performance numbers with/without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking performance numbers with/without it?

Seems not need, since we have remove these unnecessary judgement in each row's loop.
It's ok to just remove it.


// This nullability is a performance optimization in order to avoid an Option.foreach() call
// inside of a loop
@Nullable val (outputSerde, outputSoi) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Removed

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127016 has finished for PR 29199 at commit 927f260.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 10, 2020

Test build #127250 has finished for PR 29199 at commit 927f260.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 10, 2020

Test build #127259 has finished for PR 29199 at commit 927f260.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 10, 2020

Test build #127272 has finished for PR 29199 at commit 927f260.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in d251443 Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants