Skip to content

Commit bcaa799

Browse files
BryanCutlerrxin
authored andcommitted
[SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths
## What changes were proposed in this pull request? If given a list of paths, `pyspark.sql.readwriter.text` will attempt to use an undefined variable `paths`. This change checks if the param `paths` is a basestring and then converts it to a list, so that the same variable `paths` can be used for both cases ## How was this patch tested? Added unit test for reading list of files Author: Bryan Cutler <[email protected]> Closes #15379 from BryanCutler/sql-readtext-paths-SPARK-17805.
1 parent 3713bb1 commit bcaa799

File tree

2 files changed

+8
-2
lines changed

2 files changed

+8
-2
lines changed

python/pyspark/sql/readwriter.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -289,8 +289,8 @@ def text(self, paths):
289289
[Row(value=u'hello'), Row(value=u'this')]
290290
"""
291291
if isinstance(paths, basestring):
292-
path = [paths]
293-
return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path)))
292+
paths = [paths]
293+
return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))
294294

295295
@since(2.0)
296296
def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=None,

python/pyspark/sql/tests.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1702,6 +1702,12 @@ def test_cache(self):
17021702
"does_not_exist",
17031703
lambda: spark.catalog.uncacheTable("does_not_exist"))
17041704

1705+
def test_read_text_file_list(self):
1706+
df = self.spark.read.text(['python/test_support/sql/text-test.txt',
1707+
'python/test_support/sql/text-test.txt'])
1708+
count = df.count()
1709+
self.assertEquals(count, 4)
1710+
17051711

17061712
class HiveSparkSubmitTests(SparkSubmitTests):
17071713

0 commit comments

Comments
 (0)