Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions python/pyspark/sql/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,8 +289,8 @@ def text(self, paths):
[Row(value=u'hello'), Row(value=u'this')]
"""
if isinstance(paths, basestring):
path = [paths]
return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path)))
paths = [paths]
return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))
Copy link
Member

@HyukjinKwon HyukjinKwon Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a super minor but I think it'd be nicer to match up the variable name to path if this makes sense. For parquet, it takes non-keyword arguments so it seems paths but for others, it seems take a single argument, path.

Copy link
Contributor

@holdenk holdenk Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I agree keeping path here kind of makes sense.

Its unfortunate we didn't catch the difference in the named parameter difference between these reader functions back during 2.0. At this point changing the named parameter from paths to path we need to be a bit careful with incase people are using named params (if we did that we would need to add a version changed note and be careful). We could also have it (transitionally) take a kwargs work with either for a version (while updating the pydoc of course).


@since(2.0)
def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=None,
Expand Down
6 changes: 6 additions & 0 deletions python/pyspark/sql/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1702,6 +1702,12 @@ def test_cache(self):
"does_not_exist",
lambda: spark.catalog.uncacheTable("does_not_exist"))

def test_read_text_file_list(self):
df = self.spark.read.text(['python/test_support/sql/text-test.txt',
'python/test_support/sql/text-test.txt'])
count = df.count()
self.assertEquals(count, 4)


class HiveSparkSubmitTests(SparkSubmitTests):

Expand Down