[SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa… #35476

TongWei1105 · 2022-02-10T11:22:14Z

What changes were proposed in this pull request?

bug fix

Why are the changes needed?

When spark.sql.parser.quotedRegexColumnNames=true

SELECT `(C3)?+.+`,`C1` * C2 FROM (SELECT 3 AS C1,2 AS C2,1 AS C3) T;

The above query will throw an exception

Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Invalid usage of '*' in expression 'multiply'
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:370)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:266)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
        at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:44)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:266)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:275)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.sql.AnalysisException: Invalid usage of '*' in expression 'multiply'
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:155)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$expandStarExpression$1.applyOrElse(Analyzer.scala:1700)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$expandStarExpression$1.applyOrElse(Analyzer.scala:1671)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:339)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:339)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.expandStarExpression(Analyzer.scala:1671)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.$anonfun$buildExpandedProjectList$1(Analyzer.scala:1656)

It works fine in hive, because hive treats a pattern with all alphabets/digits and "_" as a normal string

  /**
   * Returns whether the pattern is a regex expression (instead of a normal
   * string). Normal string is a string with all alphabets/digits and "_".
   */
  static boolean isRegex(String pattern, HiveConf conf) {
    String qIdSupport = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_QUOTEDID_SUPPORT);
    if ( "column".equals(qIdSupport)) {
      return false;
    }
    for (int i = 0; i < pattern.length(); i++) {
      if (!Character.isLetterOrDigit(pattern.charAt(i))
          && pattern.charAt(i) != '_') {
        return true;
      }
    }
    return false;
  }

0: jdbc:hive2://hiveserver-inc.> set hive.support.quoted.identifiers=none;
No rows affected (0.003 seconds)
0: jdbc:hive2://hiveserver-inc.> SELECT `(C3)?+.+`,`C1` * C2 FROM (SELECT 3 AS C1,2 AS C2,1 AS C3) T;
22/02/10 19:01:43 INFO ql.Driver: OK
+-------+-------+------+
| t.c1  | t.c2  | _c1  |
+-------+-------+------+
| 3     | 2     | 6    |
+-------+-------+------+
1 row selected (0.136 seconds)

In this pr, we add the isRegex method to check whether the pattern is a regex expression

Does this PR introduce any user-facing change?

NO

How was this patch tested?

UT

TongWei1105 · 2022-02-10T11:30:43Z

@AngersZhuuuu

AngersZhuuuu · 2022-02-11T02:30:06Z

please change the title and enable GA. Also you can paste hive code in the PR desc.

AngersZhuuuu · 2022-02-11T02:33:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Maybe we can make it more like a scala code.
to use

pattern.find() or pattern.forall()

…mes is true

AngersZhuuuu

LGTM

AngersZhuuuu · 2022-02-11T08:11:54Z

ping @cloud-fan

AmplabJenkins · 2022-02-11T18:03:06Z

Can one of the admins verify this patch?

wugx · 2022-02-14T15:59:58Z

  private def canApplyRegex(ctx: ParserRuleContext): Boolean = withOrigin(ctx) {
    var parent = ctx.getParent
    if(!isRegex(ctx.getText)) return false

    while (parent != null) {
      if (parent.isInstanceOf[NamedExpressionContext]) return true
      parent = parent.getParent
    }
    return false
  }

The isRegex function only needs to be called once, If it is not regular, it can return false directly.

TongWei1105 · 2022-02-15T02:59:51Z

  private def canApplyRegex(ctx: ParserRuleContext): Boolean = withOrigin(ctx) {
    var parent = ctx.getParent
    if(!isRegex(ctx.getText)) return false

    while (parent != null) {
      if (parent.isInstanceOf[NamedExpressionContext]) return true
      parent = parent.getParent
    }
    return false
  }

The isRegex function only needs to be called once, If it is not regular, it can return false directly.

Thanks. Updated

AngersZhuuuu · 2022-02-15T06:27:22Z

better use

if conf.supportQuotedRegexColumnName && canApplyRegex(ctx) && isRegex(columnNameRegex)

cloud-fan · 2022-02-16T04:40:57Z

thanks, merging to master!

…when quotedRegexColumnNames is true ### What changes were proposed in this pull request? backporting #35476 to 3.2 ### Why are the changes needed? bug fixing in 3.2 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new UT Closes #39473 from huaxingao/3.2. Authored-by: huaxingao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…when quotedRegexColumnNames is true ### What changes were proposed in this pull request? backporting apache#35476 to 3.2 ### Why are the changes needed? bug fixing in 3.2 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new UT Closes apache#39473 from huaxingao/3.2. Authored-by: huaxingao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

github-actions bot added the SQL label Feb 10, 2022

TongWei1105 changed the title ~~[SPARK-38173][SQL][WIP] Quoted column cannot be recognized correctly when quotedRegexColumnNa…~~ [SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa… Feb 10, 2022

AngersZhuuuu reviewed Feb 11, 2022

View reviewed changes

TongWei1105 added 4 commits February 11, 2022 15:14

Quoted column cannot be recognized correctly when quotedRegexColumnNa…

569f875

…mes is true

Adjust code

3201ae4

Adjust code

ca4c188

code style

465a9e8

AngersZhuuuu approved these changes Feb 11, 2022

View reviewed changes

cloud-fan approved these changes Feb 11, 2022

View reviewed changes

fix

8934ee2

Merge branch 'apache:master' into SPARK-38173

274b7d3

TongWei1105 added 2 commits February 15, 2022 14:42

bugfix

a49dce4

Merge branch 'SPARK-38173' of github.com:TongWeii/spark into SPARK-38173

cee3a00

TongWei1105 requested a review from cloud-fan February 15, 2022 13:27

cloud-fan closed this in 1ef5638 Feb 16, 2022

huaxingao mentioned this pull request Jan 10, 2023

[SPARK-38173][SQL][3.2] Quoted column cannot be recognized correctly when quotedRegexColumnNames is true #39473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa… #35476

[SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa… #35476

Uh oh!

TongWei1105 commented Feb 10, 2022 •

edited

Loading

Uh oh!

TongWei1105 commented Feb 10, 2022

Uh oh!

AngersZhuuuu commented Feb 11, 2022 •

edited

Loading

Uh oh!

AngersZhuuuu Feb 11, 2022

Uh oh!

TongWei1105 Feb 11, 2022

Uh oh!

AngersZhuuuu left a comment

Uh oh!

AngersZhuuuu commented Feb 11, 2022

Uh oh!

AmplabJenkins commented Feb 11, 2022

Uh oh!

wugx commented Feb 14, 2022

Uh oh!

TongWei1105 commented Feb 15, 2022

Uh oh!

AngersZhuuuu commented Feb 15, 2022

Uh oh!

cloud-fan commented Feb 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa… #35476

[SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa… #35476

Uh oh!

Conversation

TongWei1105 commented Feb 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

TongWei1105 commented Feb 10, 2022

Uh oh!

AngersZhuuuu commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AngersZhuuuu Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

TongWei1105 Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu left a comment

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu commented Feb 11, 2022

Uh oh!

AmplabJenkins commented Feb 11, 2022

Uh oh!

wugx commented Feb 14, 2022

Uh oh!

TongWei1105 commented Feb 15, 2022

Uh oh!

AngersZhuuuu commented Feb 15, 2022

Uh oh!

cloud-fan commented Feb 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TongWei1105 commented Feb 10, 2022 •

edited

Loading

AngersZhuuuu commented Feb 11, 2022 •

edited

Loading