Skip to content

Commit 13bedc0

Browse files
MaxGekkHyukjinKwon
authored andcommitted
[SPARK-24329][SQL] Test for skipping multi-space lines
## What changes were proposed in this pull request? The PR is a continue of #21380 . It checks cases that are handled by the code: https://github.com/apache/spark/blob/e3de6ab30d52890eb08578e55eb4a5d2b4e7aa35/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala#L303-L304 Basically the code skips lines with one or many whitespaces, and lines with comments (see [filterCommentAndEmpty](https://github.com/apache/spark/blob/e3de6ab30d52890eb08578e55eb4a5d2b4e7aa35/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala#L47)) ```scala iter.filter { line => line.trim.nonEmpty && !line.startsWith(options.comment.toString) } ``` Closes #21380 ## How was this patch tested? Added a test for the case described above. Author: Maxim Gekk <[email protected]> Author: Maxim Gekk <[email protected]> Closes #21394 from MaxGekk/test-for-multi-space-lines.
1 parent 3469f5c commit 13bedc0

File tree

2 files changed

+23
-0
lines changed

2 files changed

+23
-0
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# The file contains comments, whitespaces and empty lines
2+
colA
3+
# empty line
4+
5+
# the line with a few whitespaces
6+
7+
# int value with leading and trailing whitespaces
8+
"a"

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1368,4 +1368,19 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te
13681368
checkAnswer(computed, expected)
13691369
}
13701370
}
1371+
1372+
test("SPARK-24329: skip lines with comments, and one or multiple whitespaces") {
1373+
val schema = new StructType().add("colA", StringType)
1374+
val ds = spark
1375+
.read
1376+
.schema(schema)
1377+
.option("multiLine", false)
1378+
.option("header", true)
1379+
.option("comment", "#")
1380+
.option("ignoreLeadingWhiteSpace", false)
1381+
.option("ignoreTrailingWhiteSpace", false)
1382+
.csv(testFile("test-data/comments-whitespaces.csv"))
1383+
1384+
checkAnswer(ds, Seq(Row(""" "a" """)))
1385+
}
13711386
}

0 commit comments

Comments
 (0)