-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-29537][SQL] throw exception when user defined a wrong base path #26195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #112393 has finished for PR 26195 at commit
|
|
Test build #112403 has finished for PR 26195 at commit
|
|
Failed test:
Error Message It's weird. I can not pass it even in master branch, though locally. |
|
I've already filed https://issues.apache.org/jira/browse/SPARK-29538 for that issue and left a comment in #26157. Let's see how it goes. |
|
retest this please. |
|
Test build #112776 has started for PR 26195 at commit |
|
cc @cloud-fan Please take a look, thanks. |
|
when did we add |
|
Test build #114520 has finished for PR 26195 at commit
|
After tracking the code history, I think it was introduced in #9651 from Spark 1.6. And here's documentation for
|
| "driver side must not be negative")) | ||
| } | ||
|
|
||
| test ("SPARK-29537: throw exception when user defined a wrong base path") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add an end-to-end test with DataFrameReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 261b9ad
| def qualifiedPath(path: Path): Path = path.makeQualified(fs.getUri, fs.getWorkingDirectory) | ||
|
|
||
| val qualifiedBasePath = qualifiedPath(userDefinedBasePath) | ||
| rootPaths.find(p => !qualifiedPath(p).toString. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to check sub-path using some FS APIs instead of relying on path string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find it either in Path or FileSystem.
|
|
||
| val qualifiedBasePath = qualifiedPath(userDefinedBasePath) | ||
| rootPaths.find(p => !qualifiedPath(p).toString. | ||
| startsWith(qualifiedBasePath.toString)) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indent is off here. But can you just use .find(...).foreach(rp => ...?
Or require(rootPaths.forall(p => qualifiedPath(p)...), "error message")
|
Test build #114560 has finished for PR 26195 at commit
|
| } | ||
| def qualifiedPath(path: Path): Path = path.makeQualified(fs.getUri, fs.getWorkingDirectory) | ||
|
|
||
| val qualifiedBasePath = qualifiedPath(userDefinedBasePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call toString here, to avoid calling toString later many times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can even call toString in qualifiedPath and remove the needs to call .toString altogether.
| throw new IllegalArgumentException( | ||
| s"Wrong basePath $userDefinedBasePath for the root path: $rp") | ||
| } | ||
| Set(fs.makeQualified(userDefinedBasePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simply Set(qualifiedBasePath) as we now calculated it before; if we want to change qualifiedPath() to return String, Set(new Path(qualifiedBasePath)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
| } | ||
| def qualifiedPath(path: Path): Path = path.makeQualified(fs.getUri, fs.getWorkingDirectory) | ||
|
|
||
| val qualifiedBasePath = qualifiedPath(userDefinedBasePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can even call toString in qualifiedPath and remove the needs to call .toString altogether.
| throw new IllegalArgumentException( | ||
| s"Wrong basePath $userDefinedBasePath for the root path: $rp") | ||
| } | ||
| Set(new Path(qualifiedBasePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should reduce overhead as possible as we can
val qualifiedBasePath = fs.makeQualified(userDefinedBasePath)
val qualifiedBasePathStr = qualifiedBasePath.toString
rootPaths.find...
Set(qualifiedBasePath)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see.
| val qualifiedBasePath = fs.makeQualified(userDefinedBasePath) | ||
| val qualifiedBasePathStr = qualifiedBasePath.toString | ||
| rootPaths | ||
| .find(!fs.makeQualified(_).toString.startsWith(qualifiedBasePathStr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review note: I've inlined the qualified() function into find() clause.
|
Test build #114729 has finished for PR 26195 at commit
|
|
Test build #114732 has finished for PR 26195 at commit
|
|
thanks, merging to master! |
|
Thanks! @cloud-fan @HeartSaVioR @HyukjinKwon @srowen |
### What changes were proposed in this pull request?
When user defined a base path which is not an ancestor directory for all the input paths,
throw exception immediately.
### Why are the changes needed?
Assuming that we have a DataFrame[c1, c2] be written out in parquet and partitioned by c1.
When using `spark.read.parquet("/path/to/data/c1=1")` to read the data, we'll have a DataFrame with column c2 only.
But if we use `spark.read.option("basePath", "/path/from").parquet("/path/to/data/c1=1")` to
read the data, we'll have a DataFrame with column c1 and c2.
This's happens because a wrong base path does not actually work in `parsePartition()`, so paring would continue until it reaches a directory without "=".
And I think the result of the second read way doesn't make sense.
### Does this PR introduce any user-facing change?
Yes, with this change, user would hit `IllegalArgumentException ` when given a wrong base path while previous behavior doesn't.
### How was this patch tested?
Added UT.
Closes apache#26195 from Ngone51/dev-wrong-basePath.
Lead-authored-by: wuyi <[email protected]>
Co-authored-by: wuyi <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
When user defined a base path which is not an ancestor directory for all the input paths,
throw exception immediately.
Why are the changes needed?
Assuming that we have a DataFrame[c1, c2] be written out in parquet and partitioned by c1.
When using
spark.read.parquet("/path/to/data/c1=1")to read the data, we'll have a DataFrame with column c2 only.But if we use
spark.read.option("basePath", "/path/from").parquet("/path/to/data/c1=1")toread the data, we'll have a DataFrame with column c1 and c2.
This's happens because a wrong base path does not actually work in
parsePartition(), so paring would continue until it reaches a directory without "=".And I think the result of the second read way doesn't make sense.
Does this PR introduce any user-facing change?
Yes, with this change, user would hit
IllegalArgumentExceptionwhen given a wrong base path while previous behavior doesn't.How was this patch tested?
Added UT.