@@ -57,6 +57,77 @@ Schema Inference
5757
5858 .. include:: /scala/schema-inference.rst
5959
60+ .. _spark-schema-hint:
61+
62+ Specify Known Fields with Schema Hints
63+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64+
65+ You can specify a schema containing known field values to use during
66+ schema inference by specifying the ``schemaHint`` configuration option. You can
67+ specify the ``schemaHint`` option in any of the following Spark formats:
68+
69+ .. list-table::
70+ :header-rows: 1
71+ :widths: 35 65
72+
73+ * - Type
74+ - Format
75+
76+ * - DDL
77+ - ``<field one name> <FIELD ONE TYPE>, <field two name> <FIELD TWO TYPE>``
78+
79+ * - SQL DDL
80+ - ``STRUCT<<field one name>: <FIELD ONE TYPE>, <field two name>: <FIELD TWO TYPE>``
81+
82+ * - JSON
83+ - .. code-block:: json
84+ :copyable: false
85+
86+ { "type": "struct", "fields": [
87+ { "name": "<field name>", "type": "<field type>", "nullable": <true/false> },
88+ { "name": "<field name>", "type": "<field type>", "nullable": <true/false> }]}
89+
90+ The following example shows how to specify the ``schemaHint`` option in each
91+ format by using the Spark shell. The example specifies a string-valued field named
92+ ``"value"`` and an integer-valued field named ``"count"``.
93+
94+ .. code-block:: scala
95+
96+ import org.apache.spark.sql.types._
97+
98+ val mySchema = StructType(Seq(
99+ StructField("value", StringType),
100+ StructField("count", IntegerType))
101+
102+ // Generate DDL format
103+ mySchema.toDDL
104+
105+ // Generate SQL DDL format
106+ mySchema.sql
107+
108+ // Generate Simple String DDL format
109+ mySchema.simpleString
110+
111+ // Generate JSON format
112+ mySchema.json
113+
114+ You can also specify the ``schemaHint`` option in the Simple String DDL format,
115+ or in JSON format by using PySpark, as shown in the following example:
116+
117+ .. code-block:: python
118+
119+ from pyspark.sql.types import StructType, StructField, StringType, IntegerType
120+
121+ mySchema = StructType([
122+ StructField('value', StringType(), True),
123+ StructField('count', IntegerType(), True)])
124+
125+ # Generate Simple String DDL format
126+ mySchema.simpleString()
127+
128+ # Generate JSON format
129+ mySchema.json()
130+
60131Filters
61132-------
62133
0 commit comments