-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-29101][SQL][2.4] Fix count API for csv file when DROPMALFORMED mode is selected #25843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29101][SQL][2.4] Fix count API for csv file when DROPMALFORMED mode is selected #25843
Conversation
|
ok to test |
|
Thank you for backporting, @sandeep-katta . |
|
cc @HyukjinKwon |
|
Test build #110964 has finished for PR 25843 at commit
|
|
retest this please |
|
Test build #110966 has finished for PR 25843 at commit
|
|
Retest this please. |
|
Test build #110969 has finished for PR 25843 at commit
|
|
retest this please |
|
Test build #110980 has finished for PR 25843 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK. It's worth noting this is a backport of #25820
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to branch-2.4
Thank you, @sandeep-katta , @HyukjinKwon , @srowen !
… mode is selected
### What changes were proposed in this pull request?
#DataSet
fruit,color,price,quantity
apple,red,1,3
banana,yellow,2,4
orange,orange,3,5
xxx
This PR aims to fix the below
```
scala> spark.conf.set("spark.sql.csv.parser.columnPruning.enabled", false)
scala> spark.read.option("header", "true").option("mode", "DROPMALFORMED").csv("fruit.csv").count
res1: Long = 4
```
This is caused by the issue [SPARK-24645](https://issues.apache.org/jira/browse/SPARK-24645).
SPARK-24645 issue can also be solved by [SPARK-25387](https://issues.apache.org/jira/browse/SPARK-25387)
### Why are the changes needed?
SPARK-24645 caused this regression, so reverted the code as it can also be solved by SPARK-25387
### Does this PR introduce any user-facing change?
No,
### How was this patch tested?
Added UT, and also tested the bug SPARK-24645
**SPARK-24645 regression**

Closes #25843 from sandeep-katta/SPARK-29101_branch2.4.
Authored-by: sandeep katta <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
#DataSet
fruit,color,price,quantity
apple,red,1,3
banana,yellow,2,4
orange,orange,3,5
xxx
This PR aims to fix the below
This is caused by the issue SPARK-24645.
SPARK-24645 issue can also be solved by SPARK-25387
Why are the changes needed?
SPARK-24645 caused this regression, so reverted the code as it can also be solved by SPARK-25387
Does this PR introduce any user-facing change?
No,
How was this patch tested?
Added UT, and also tested the bug SPARK-24645
SPARK-24645 regression
