You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sql-programming-guide.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -605,7 +605,7 @@ You may also use the beeline script comes with Hive.
605
605
606
606
#### Reducer number
607
607
608
-
In Shark, default reducer number is 1, and can be tuned by property `mapred.reduce.tasks`. In Spark SQL, reducer number is default to 200, and can be customized by the `spark.sql.shuffle.partitions`property:
608
+
In Shark, default reducer number is 1 and is controlled by the property `mapred.reduce.tasks`. Spark SQL deprecates this property by a new property `spark.sql.shuffle.partitions`, whose default value is 200. Users may customize this property via `SET`:
609
609
610
610
```
611
611
SET spark.sql.shuffle.partitions=10;
@@ -615,6 +615,8 @@ GROUP BY page ORDER BY c DESC LIMIT 10;
615
615
616
616
You may also put this property in `hive-site.xml` to override the default value.
617
617
618
+
For now, the `mapred.reduce.tasks` property is still recognized, and is converted to `spark.sql.shuffle.partitions` automatically.
619
+
618
620
#### Caching
619
621
620
622
The `shark.cache` table property no longer exists, and tables whose name end with `_cached` are no longer automcatically cached. Instead, we provide `CACHE TABLE` and `UNCACHE TABLE` statements to let user control table caching explicitly:
@@ -697,7 +699,7 @@ Spark SQL supports the vast majority of Hive features, such as:
697
699
698
700
#### Unsupported Hive Functionality
699
701
700
-
Below is a list of Hive features that we don't support yet. Most of these features are rarely used in Hive deployments.
702
+
Below is a list of Hive features that we don't support yet. Most of these features are rarely used in Hive deployments.
701
703
702
704
**Major Hive Features**
703
705
@@ -723,7 +725,7 @@ A handful of Hive optimizations are not yet included in Spark. Some of these (su
723
725
724
726
* Block level bitmap indexes and virtual columns (used to build indexes)
725
727
* Automatically convert a join to map join: For joining a large table with multiple small tables, Hive automatically converts the join into a map join. We are adding this auto conversion in the next release.
726
-
* Automatically determine the number of reducers for joins and groupbys: Currently in Spark SQL, you need to control the degree of parallelism post-shuffle using "set mapred.reduce.tasks=[num_tasks];". We are going to add auto-setting of parallelism in the next release.
728
+
* Automatically determine the number of reducers for joins and groupbys: Currently in Spark SQL, you need to control the degree of parallelism post-shuffle using "SET spark.sql.shuffle.partitions=[num_tasks];". We are going to add auto-setting of parallelism in the next release.
727
729
* Meta-data only query: For queries that can be answered by using only meta data, Spark SQL still launches tasks to compute the result.
728
730
* Skew data flag: Spark SQL does not follow the skew data flags in Hive.
729
731
*`STREAMTABLE` hint in join: Spark SQL does not follow the `STREAMTABLE` hint.
0 commit comments