-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16236] [SQL] Add Path Option back to Load API in DataFrameReader #13933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #61333 has finished for PR 13933 at commit
|
|
Test build #61342 has finished for PR 13933 at commit
|
|
cc @rxin The code is ready for review. Thanks! |
|
LGTM -- cc @tdas to take a look since he wrote the original patch. |
|
LGTM. |
|
Merging in master/2.0. |
#### What changes were proposed in this pull request? koertkuipers identified the PR #13727 changed the behavior of `load` API. After the change, the `load` API does not add the value of `path` into the `options`. Thank you! This PR is to add the option `path` back to `load()` API in `DataFrameReader`, if and only if users specify one and only one `path` in the `load` API. For example, users can see the `path` option after the following API call, ```Scala spark.read .format("parquet") .load("/test") ``` #### How was this patch tested? Added test cases. Author: gatorsmile <[email protected]> Closes #13933 from gatorsmile/optionPath. (cherry picked from commit 25520e9) Signed-off-by: Reynold Xin <[email protected]>
|
I noticed that the Python API is inconsistent here: spark/python/pyspark/sql/readwriter.py Line 147 in 1aad8c6
It always calls |
|
@zsxwing Let me try to fix it. Thanks! |
|
The problem also exists in the other APIs: spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala Lines 407 to 410 in 25520e9
Let me fix them in the same PR. Thanks! |
|
@gatorsmile parquet, json or other file formats support both |
|
@zsxwing If we just provide one Could you review the new PR I just submitted? Let me know if anything is not appropriate. #13965. Thanks! |
|
For parquet, json etc. path not being put in options is not an issue since
|
What changes were proposed in this pull request?
@koertkuipers identified the PR #13727 changed the behavior of
loadAPI. After the change, theloadAPI does not add the value ofpathinto theoptions. Thank you!This PR is to add the option
pathback toload()API inDataFrameReader, if and only if users specify one and only onepathin theloadAPI. For example, users can see thepathoption after the following API call,How was this patch tested?
Added test cases.