Skip to content

Conversation

@goldmedal
Copy link
Contributor

What changes were proposed in this pull request?

In previous work SPARK-21513, we has allowed MapType and ArrayType of MapTypes convert to a json string but only for Scala API. In this follow-up PR, we will make SparkSQL support it for PySpark and SparkR, too. We also fix some little bugs and comments of the previous work in this follow-up PR.

For PySpark

>>> data = [(1, {"name": "Alice"})]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(to_json(df.value).alias("json")).collect()
[Row(json=u'{"name":"Alice")']
>>> data = [(1, [{"name": "Alice"}, {"name": "Bob"}])]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(to_json(df.value).alias("json")).collect()
[Row(json=u'[{"name":"Alice"},{"name":"Bob"}]')]

For SparkR

# Converts a map into a JSON object
df2 <- sql("SELECT map('name', 'Bob')) as people")
df2 <- mutate(df2, people_json = to_json(df2$people))
# Converts an array of maps into a JSON array
df2 <- sql("SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people")
df2 <- mutate(df2, people_json = to_json(df2$people))

How was this patch tested?

Add unit test cases.

cc @viirya @HyukjinKwon

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Sep 13, 2017

Test build #81739 has finished for PR 19223 at commit 29e7323.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#' \code{to_json}: Converts a column containing a \code{structType} or array of \code{structType}
#' into a Column of JSON string. Resolving the Column can fail if an unsupported type is encountered.
#' \code{to_json}: Converts a column containing a \code{structType}, array of \code{structType},
# a \code{mapType} or array of \code{mapType} into a Column of JSON string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ' is missed at the first #.

JSON string. Throws an exception, in the case of an unsupported type.
Converts a column containing a [[StructType]], [[ArrayType]] of [[StructType]]s,
a [[MapType]] or [[ArrayType]] of [[MapType]] into a JSON string.
Throws an exception, in the case of an unsupported type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are here, let's fix [[StructType]] to :class:`StructType` (and the same instances too) to make Python API documentation pretty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok Thanks.

#'
#' # Converts an array of maps into a JSON array
#' df2 <- sql("SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people")
#' df2 <- mutate(df2, people_json = to_json(df2$people))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And.. missing closing parentheses for \dontrun{.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... meaning }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok Thanks for careful review :)

> SELECT to_json(array(named_struct('a', 1, 'b', 2));
[{"a":1,"b":2}]
> SELECT to_json(map('a',named_struct('b',1)));
> SELECT to_json(map('a', named_struct('b', 1)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you committed unrelated change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you forget to commit json-functions.sql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm. I modified ExpressionDescription of StructsToJson at @HyukjinKwon 's suggestions which didn't be merged in last PR. Here's the test for describe function extended to_json, so I needed to regenerate the golden file for it. So this change isn't from json-functions.sql.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I see.

"""
Converts a column containing a [[StructType]] or [[ArrayType]] of [[StructType]]s into a
JSON string. Throws an exception, in the case of an unsupported type.
Converts a column containing a :class:`StructType, :class:`ArrayType` of :class:`StructType`s,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a ` after StructType.

@viirya
Copy link
Member

viirya commented Sep 14, 2017

LGTM except for one comment left.

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81766 has finished for PR 19223 at commit af8d941.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81761 has finished for PR 19223 at commit 158140e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Sep 14, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81769 has finished for PR 19223 at commit af8d941.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@goldmedal
Copy link
Contributor Author

@HyukjinKwon @felixcheung @viirya
I has finished those change at your suggestions for this PR and it also passed all tests. Please take a look when you are available. Thanks :)

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @felixcheung, do you maybe have more comments?

"""
Converts a column containing a [[StructType]] or [[ArrayType]] of [[StructType]]s into a
JSON string. Throws an exception, in the case of an unsupported type.
Converts a column containing a :class:`StructType`, :class:`ArrayType` of :class:`StructType`s,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe :class:`StructType`s should be :class:`StructType`\\s. Currently, this is rendered as below:

2017-09-14 7 12 52

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll modified it. Because I'm not really familiar with python, thanks for your suggestions. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine :).

Converts a column containing a [[StructType]] or [[ArrayType]] of [[StructType]]s into a
JSON string. Throws an exception, in the case of an unsupported type.
Converts a column containing a :class:`StructType`, :class:`ArrayType` of :class:`StructType`s,
a :class:`MapType` or :class:`ArrayType` of :class:`MapType` into a JSON string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`:class:`MapType` -> :class:`MapType`\\s for consistency.

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81780 has finished for PR 19223 at commit 8a3a068.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81781 has finished for PR 19223 at commit 66bc5b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Sep 14, 2017

LGTM

@felixcheung
Copy link
Member

felixcheung commented Sep 14, 2017 via email

@HyukjinKwon
Copy link
Member

D'oh, yes. I wonder why it was not triggered. I manually triggered via my account:

Build started: [SparkR] ALL PR-19223
Diff: master...spark-test:8981A5F1-E2DC-4015-8266-12E9ADEE189B

@goldmedal
Copy link
Contributor Author

@HyukjinKwon Thanks for triggering AppVeyor. In normal case, will AppVeyor be triggered automatically?

@HyukjinKwon
Copy link
Member

Yes, when there are some changes in:

spark/appveyor.yml

Lines 29 to 35 in 828fab0

- appveyor.yml
- dev/appveyor-install-dependencies.ps1
- R/
- sql/core/src/main/scala/org/apache/spark/sql/api/r/
- core/src/main/scala/org/apache/spark/api/r/
- mllib/src/main/scala/org/apache/spark/ml/r/
- core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

It should run the R tests on Windows via AppVeyor.

@goldmedal
Copy link
Contributor Author

ok. I got it. Thanks :)

@HyukjinKwon
Copy link
Member

Looks passed fine. Let me merge this one.

@viirya
Copy link
Member

viirya commented Sep 15, 2017

Thanks @felixcheung @HyukjinKwon

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in a28728a Sep 15, 2017
@goldmedal
Copy link
Contributor Author

Thanks @HyukjinKwon @felixcheung @viirya

@goldmedal goldmedal deleted the SPARK-21513-fp-PySaprkAndSparkR branch September 15, 2017 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants