[SPARK-16904] [SQL] Removal of Hive Built-in Hash Functions and TestHiveFunctionRegistry #14498

gatorsmile · 2016-08-04T21:55:10Z

What changes were proposed in this pull request?

Currently, the Hive built-in hash function is not being used in Spark since Spark 2.0. The public interface does not allow users to unregister the Spark built-in functions. Thus, users will never use Hive's built-in hash function.

The only exception here is TestHiveFunctionRegistry, which allows users to unregister the built-in functions. Thus, we can load Hive's hash function in the test cases. If we disable it, 10+ test cases will fail because the results are different from the Hive golden answer files.

This PR is to remove hash from the list of hiveFunctions in HiveSessionCatalog. It will also remove TestHiveFunctionRegistry. This removal makes us easier to remove TestHiveSessionState in the future.

How was this patch tested?

N/A

SparkQA · 2016-08-04T22:00:31Z

Test build #63240 has finished for PR 14498 at commit 3b3f3c8.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-08-04T23:06:09Z

retest this please

SparkQA · 2016-08-05T00:33:35Z

Test build #63247 has finished for PR 14498 at commit 3b3f3c8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-08-05T10:45:42Z

This seems fair. Shouldn't we do something about all the disabled test cases though?

gatorsmile · 2016-08-05T18:15:02Z

Yeah, we should check all the disabled test cases and see if we can move it. I am waiting for #14472.

: )

gatorsmile · 2016-08-11T00:10:43Z

#14472 has been merged. Will work on it soon.

tejasapatil · 2016-08-18T20:11:55Z

Is Spark's hashing function semantically equivalent to Hive's ? AFAIK, its not. I think it would be better to have a mode to be able to use Hive's hash method. eg. case when this would be needed: Users running a query in Hive want to switch to Spark. As this happens, you want to verify if the data produced is same or not. Also, for a brief time the pipeline would run in both the engines. Upstream consumers of the data generated should not see differences due to running in the different engines

gatorsmile · 2016-08-19T00:02:23Z

@tejasapatil Thanks for the inputs! I can understand your usage cases. It looks very reasonable.

Actually, in Spark 2.0, we already removed it. You are unable to load/invoke the Hive's hash function. This PR is just to clean the usage of hash function in our testing package.

rxin · 2016-11-03T09:08:38Z

@gatorsmile can we take this to the finish line? This actually does not impact hash at all since it's only here for testing.

gatorsmile · 2016-11-03T16:47:36Z

@rxin Sure, will do it this weekend. Thanks!

cloud-fan · 2016-11-05T04:49:13Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala

 }

-
-private[hive] class TestHiveFunctionRegistry extends SimpleFunctionRegistry {


We can still remove this class if we add back the removed spark builtin hash function manually right?

Yeah, I think so

rxin · 2016-11-07T07:44:29Z

If this is no longer WIP, please update the title. Thanks.

gatorsmile · 2016-11-07T07:50:12Z

Based on the previous discussions in the other PRs, it sounds like these Hive-specific test cases are not very useful. Do we still need to rewrite them and add them to the other test suites?

    "auto_join19",
    "auto_join22",
    "auto_join25",
    "auto_join26",
    "auto_join27",
    "auto_join28",
    "auto_join30",
    "auto_join31",
    "auto_join_nulls",
    "auto_join_reordering_values",
    "correlationoptimizer1",
    "correlationoptimizer2",
    "correlationoptimizer3",
    "correlationoptimizer4",
    "multiMapJoin1",
    "orc_dictionary_threshold",
    "udf_hash"

BTW, I went over these disabled test cases. I think the values are low.

rxin · 2016-11-07T08:04:45Z

I'm ok with not having them.

SparkQA · 2016-11-07T09:11:40Z

Test build #68268 has finished for PR 14498 at commit 05390ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-11-07T09:15:44Z

Merging in master/branch-2.1.

…veFunctionRegistry ### What changes were proposed in this pull request? Currently, the Hive built-in `hash` function is not being used in Spark since Spark 2.0. The public interface does not allow users to unregister the Spark built-in functions. Thus, users will never use Hive's built-in `hash` function. The only exception here is `TestHiveFunctionRegistry`, which allows users to unregister the built-in functions. Thus, we can load Hive's hash function in the test cases. If we disable it, 10+ test cases will fail because the results are different from the Hive golden answer files. This PR is to remove `hash` from the list of `hiveFunctions` in `HiveSessionCatalog`. It will also remove `TestHiveFunctionRegistry`. This removal makes us easier to remove `TestHiveSessionState` in the future. ### How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes #14498 from gatorsmile/removeHash. (cherry picked from commit 57626a5) Signed-off-by: Reynold Xin <[email protected]>

…veFunctionRegistry ### What changes were proposed in this pull request? Currently, the Hive built-in `hash` function is not being used in Spark since Spark 2.0. The public interface does not allow users to unregister the Spark built-in functions. Thus, users will never use Hive's built-in `hash` function. The only exception here is `TestHiveFunctionRegistry`, which allows users to unregister the built-in functions. Thus, we can load Hive's hash function in the test cases. If we disable it, 10+ test cases will fail because the results are different from the Hive golden answer files. This PR is to remove `hash` from the list of `hiveFunctions` in `HiveSessionCatalog`. It will also remove `TestHiveFunctionRegistry`. This removal makes us easier to remove `TestHiveSessionState` in the future. ### How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes apache#14498 from gatorsmile/removeHash.

gatorsmile added 2 commits August 3, 2016 21:06

remove hash

a26122a

Merge remote-tracking branch 'upstream/master' into removeHash

3b3f3c8

gatorsmile added 2 commits August 12, 2016 10:10

Merge remote-tracking branch 'upstream/master' into removeHash

790f9e6

Merge remote-tracking branch 'upstream/master' into removeHash

abdeadf

gatorsmile mentioned this pull request Aug 12, 2016

[SPARK-17045] [SQL] Build/move Join-related test cases in SQLQueryTestSuite #14625

Closed

gatorsmile mentioned this pull request Nov 5, 2016

[SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFunctions is redundant #15766

Closed

cloud-fan reviewed Nov 5, 2016

View reviewed changes

Merge remote-tracking branch 'upstream/master' into removeHash

05390ad

gatorsmile changed the title ~~[SPARK-16904] [SQL] Removal of Hive Built-in Hash Functions and TestHiveFunctionRegistry [WIP]~~ [SPARK-16904] [SQL] Removal of Hive Built-in Hash Functions and TestHiveFunctionRegistry Nov 7, 2016

asfgit closed this in 57626a5 Nov 7, 2016

		}


		private[hive] class TestHiveFunctionRegistry extends SimpleFunctionRegistry {

[SPARK-16904] [SQL] Removal of Hive Built-in Hash Functions and TestHiveFunctionRegistry #14498

[SPARK-16904] [SQL] Removal of Hive Built-in Hash Functions and TestHiveFunctionRegistry #14498

Uh oh!

Conversation

gatorsmile commented Aug 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Aug 4, 2016

Uh oh!

gatorsmile commented Aug 4, 2016

Uh oh!

SparkQA commented Aug 5, 2016

Uh oh!

hvanhovell commented Aug 5, 2016

Uh oh!

gatorsmile commented Aug 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Aug 11, 2016

Uh oh!

tejasapatil commented Aug 18, 2016

Uh oh!

gatorsmile commented Aug 19, 2016

Uh oh!

rxin commented Nov 3, 2016

Uh oh!

gatorsmile commented Nov 3, 2016

Uh oh!

cloud-fan Nov 5, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 7, 2016

Choose a reason for hiding this comment

Uh oh!

rxin commented Nov 7, 2016

Uh oh!

gatorsmile commented Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented Nov 7, 2016

Uh oh!

SparkQA commented Nov 7, 2016

Uh oh!

rxin commented Nov 7, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gatorsmile commented Aug 4, 2016 •

edited

Loading

gatorsmile commented Aug 5, 2016 •

edited

Loading

gatorsmile commented Nov 7, 2016 •

edited

Loading