Skip to content

Commit bca4259

Browse files
HyukjinKwonsrowen
authored andcommitted
[MINOR][DOCS] JSON APIs related documentation fixes
## What changes were proposed in this pull request? This PR proposes corrections related to JSON APIs as below: - Rendering links in Python documentation - Replacing `RDD` to `Dataset` in programing guide - Adding missing description about JSON Lines consistently in `DataFrameReader.json` in Python API - De-duplicating little bit of `DataFrameReader.json` in Scala/Java API ## How was this patch tested? Manually build the documentation via `jekyll build`. Corresponding snapstops will be left on the codes. Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in #17477. So, this PR does not fix those. Author: hyukjinkwon <[email protected]> Closes #17602 from HyukjinKwon/minor-json-documentation.
1 parent b938438 commit bca4259

File tree

6 files changed

+13
-11
lines changed

6 files changed

+13
-11
lines changed

docs/sql-programming-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -883,7 +883,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
883883

884884
<div data-lang="scala" markdown="1">
885885
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`.
886-
This conversion can be done using `SparkSession.read.json()` on either an RDD of String,
886+
This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`,
887887
or a JSON file.
888888

889889
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
@@ -897,7 +897,7 @@ For a regular multi-line JSON file, set the `wholeFile` option to `true`.
897897

898898
<div data-lang="java" markdown="1">
899899
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`.
900-
This conversion can be done using `SparkSession.read().json()` on either an RDD of String,
900+
This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`,
901901
or a JSON file.
902902

903903
Note that the file that is offered as _a json file_ is not a typical JSON file. Each

examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,7 @@ private static void runJsonDatasetExample(SparkSession spark) {
215215
// +------+
216216

217217
// Alternatively, a DataFrame can be created for a JSON dataset represented by
218-
// an Dataset[String] storing one JSON object per string.
218+
// a Dataset<String> storing one JSON object per string.
219219
List<String> jsonData = Arrays.asList(
220220
"{\"name\":\"Yin\",\"address\":{\"city\":\"Columbus\",\"state\":\"Ohio\"}}");
221221
Dataset<String> anotherPeopleDataset = spark.createDataset(jsonData, Encoders.STRING());

examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ object SQLDataSourceExample {
139139
// +------+
140140

141141
// Alternatively, a DataFrame can be created for a JSON dataset represented by
142-
// an Dataset[String] storing one JSON object per string
142+
// a Dataset[String] storing one JSON object per string
143143
val otherPeopleDataset = spark.createDataset(
144144
"""{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
145145
val otherPeople = spark.read.json(otherPeopleDataset)

python/pyspark/sql/readwriter.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -173,8 +173,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
173173
"""
174174
Loads JSON files and returns the results as a :class:`DataFrame`.
175175
176-
`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
177-
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
176+
`JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
177+
For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.
178178
179179
If the ``schema`` parameter is not specified, this function goes
180180
through the input once to determine the input schema.
@@ -634,7 +634,9 @@ def saveAsTable(self, name, format=None, mode=None, partitionBy=None, **options)
634634

635635
@since(1.4)
636636
def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None):
637-
"""Saves the content of the :class:`DataFrame` in JSON format at the specified path.
637+
"""Saves the content of the :class:`DataFrame` in JSON format
638+
(`JSON Lines text format or newline-delimited JSON <http://jsonlines.org/>`_) at the
639+
specified path.
638640
639641
:param path: the path in any Hadoop supported file system
640642
:param mode: specifies the behavior of the save operation when data already exists.

python/pyspark/sql/streaming.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -405,8 +405,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
405405
"""
406406
Loads a JSON file stream and returns the results as a :class:`DataFrame`.
407407
408-
`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
409-
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
408+
`JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
409+
For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.
410410
411411
If the ``schema`` parameter is not specified, this function goes
412412
through the input once to determine the input schema.

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -268,8 +268,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
268268
}
269269

270270
/**
271-
* Loads a JSON file (<a href="http://jsonlines.org/">JSON Lines text format or
272-
* newline-delimited JSON</a>) and returns the result as a `DataFrame`.
271+
* Loads a JSON file and returns the results as a `DataFrame`.
272+
*
273273
* See the documentation on the overloaded `json()` method with varargs for more details.
274274
*
275275
* @since 1.4.0

0 commit comments

Comments
 (0)