[SPARK-21261][DOCS]SQL Regex document fix #18477

visaxin · 2017-06-30T03:30:32Z

SQL regex docs change:
SELECT _FUNC_('100-200', '(\d+)', 'num') => SELECT _FUNC_('100-200', '(\\d+)', 'num')

gf53520 · 2017-06-30T04:03:49Z

test this please

srowen · 2017-06-30T11:00:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala

There is another example that needs the same change near the end of the file too.

Do we need to fix this? I remember in the doc, we use unescaped characters.

@viirya I'm not an expert here, but reading the docs on line 160, I think this needs to be escaped in order to be consistent with Spark 2 default behavior? my assumption was that this was just never updated.

Hmm, when I wrote the docs on line 160, I was suggested to use unescaped characters.

Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match "\abc", a regular expression for regexp can be "^\abc$".

Actually, you need to write like this in spark-shell:

scala> sql("SELECT like('\\\\abc', '\\\\\\\\abc')").show +---------------+ |\abc LIKE \\abc| +---------------+ | true| +---------------+ scala> sql("SELECT regexp_replace('100-200', '(\\\\d+)', 'num')").show +-----------------------------------+ |regexp_replace(100-200, (\d+), num)| +-----------------------------------+ | num-num| +-----------------------------------+

The behavior of Spark 2 when parsing SQL string literal reads \\\\abc as \abc and (\\\\d+) as (\d+) in spark-shell.

But in spark-sql, you write the queries like this:

spark-sql> SELECT like('\\abc', '\\\\abc'); true Time taken: 0.061 seconds, Fetched 1 row(s) spark-sql> SELECT regexp_replace('100-200', '(\\d+)', 'num'); num-num Time taken: 0.117 seconds, Fetched 1 row(s)

So depending how the shell environment processes string escaping, the query looks different. In the docs, it seems to me that writing in unescaped style can avoid this confusion?

Is the better fix to make it clear that this example uses unescaped style @viirya ?

Yeah, if we can.

I add spark-sql and scala to make it clear.

gatorsmile · 2017-10-24T00:13:58Z

@visaxin Could you address the comment?

visaxin · 2017-10-24T05:42:34Z

@gatorsmile Done

gatorsmile · 2017-10-24T18:33:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala

+      spark-sql> SELECT _FUNC_('100-200', '(\\d+)-(\\d+)', 1);
+       100
+
+      scala> SELECT _FUNC_('100-200', '(\\\\d+)-(\\\\d+)', 1);


scala> spark.sql("SELECT regexp_extract('100-200', '(\\d+)-(\\d+)', 1)").collect()

gatorsmile · 2017-10-24T18:34:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala

+       100
+
+      scala> SELECT _FUNC_('100-200', '(\\\\d+)-(\\\\d+)', 1);
       100


Array([100])

gatorsmile · 2017-10-24T18:35:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala

+       num-num
+
+      scala> SELECT _FUNC_('100-200', '(\\\\d+)', 'num');
       num-num


scala> spark.sql("SELECT regexp_replace('100-200', '(\\d+)', 'num')").collect()
Array([num-num])

gatorsmile · 2017-10-24T18:35:30Z

LGTM except the above three comments.

srowen · 2018-05-11T15:41:19Z

@visaxin this one is old but could you update it per the last review comments?

AmplabJenkins · 2018-06-09T00:20:51Z

Can one of the admins verify this patch?

srowen · 2018-07-02T13:58:54Z

Ping @visaxin

HyukjinKwon · 2018-07-16T02:42:00Z

Ping @visaxin

srowen · 2018-07-18T17:28:05Z

I took this over at #21808
I don't think this change is even right as it introduces scala-shell examples.

## What changes were proposed in this pull request? Fix regexes in spark-sql command examples. This takes over #18477 ## How was this patch tested? Existing tests. I verified the existing example doesn't work in spark-sql, but new ones does. Author: Sean Owen <[email protected]> Closes #21808 from srowen/SPARK-21261.

srowen reviewed Jun 30, 2017

View reviewed changes

visaxin force-pushed the FixSQLDocuments branch from 1850d87 to 8a7dd55 Compare October 24, 2017 05:39

regex document fix

08adf17

visaxin force-pushed the FixSQLDocuments branch from 8a7dd55 to 08adf17 Compare October 24, 2017 05:41

gatorsmile reviewed Oct 24, 2017

View reviewed changes

srowen mentioned this pull request Jul 18, 2018

[SPARK-21261][DOCS][SQL] SQL Regex document fix #21808

Closed

srowen mentioned this pull request Jul 18, 2018

[INFRA] Close stale PR #21781

Closed

asfgit closed this in 1a4fda8 Jul 19, 2018

[SPARK-21261][DOCS]SQL Regex document fix #18477

[SPARK-21261][DOCS]SQL Regex document fix #18477

Uh oh!

Conversation

visaxin commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gf53520 commented Jun 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Oct 24, 2017

Uh oh!

visaxin commented Oct 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Oct 24, 2017

Uh oh!

srowen commented May 11, 2018

Uh oh!

AmplabJenkins commented Jun 9, 2018

Uh oh!

srowen commented Jul 2, 2018

Uh oh!

HyukjinKwon commented Jul 16, 2018

Uh oh!

srowen commented Jul 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

visaxin commented Jun 30, 2017 •

edited

Loading