Skip to content

Commit 1b2891d

Browse files
committed
[SPARK-2010] minor doc change and adding a TODO
1 parent 504f27e commit 1b2891d

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

python/pyspark/sql.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,15 +76,15 @@ def inferSchema(self, rdd):
7676
"""Infer and apply a schema to an RDD of L{dict}s.
7777
7878
We peek at the first row of the RDD to determine the fields names
79-
and types, and then use that to extract all the dictionaries.
79+
and types, and then use that to extract all the dictionaries. Nested
80+
collections are supported, which include array, dict, list, set, and
81+
tuple.
8082
8183
>>> srdd = sqlCtx.inferSchema(rdd)
8284
>>> srdd.collect() == [{"field1" : 1, "field2" : "row1"}, {"field1" : 2, "field2": "row2"},
8385
... {"field1" : 3, "field2": "row3"}]
8486
True
8587
86-
Nested collections are supported, which include array, dict, list, set, and tuple.
87-
8888
>>> from array import array
8989
>>> srdd = sqlCtx.inferSchema(nestedRdd1)
9090
>>> srdd.collect() == [{"f1" : array('i', [1, 2]), "f2" : {"row1" : 1.0}},

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
298298

299299
/**
300300
* Peek at the first row of the RDD and infer its schema.
301+
* TODO: consolidate this with the type system developed in SPARK-2060.
301302
*/
302303
private[sql] def inferSchema(rdd: RDD[Map[String, _]]): SchemaRDD = {
303304
import scala.collection.JavaConversions._

0 commit comments

Comments
 (0)