Skip to content

Conversation

@jzhuge
Copy link
Member

@jzhuge jzhuge commented May 23, 2019

What changes were proposed in this pull request?

Require the lookup function with interface LookupCatalog. Rationale is in the review comments below.

Make Analyzer abstract. BaseSessionStateBuilder and HiveSessionStateBuilder implements lookupCatalog with a call to SparkSession.catalog().

Existing test cases and those that don't need catalog lookup will use a newly added TestAnalyzer with a default lookup function that throws CatalogNotFoundException("No catalog lookup function").

Rewrote the unit test for LookupCatalog to demonstrate the interface can be used anywhere, not just Analyzer.

Removed Analyzer parameter lookupCatalog because we can override in the following manner:

new Analyzer() {
  override def lookupCatalog(name: String): CatalogPlugin = ???
}

How was this patch tested?

Existing unit tests.

@SparkQA
Copy link

SparkQA commented May 23, 2019

Test build #105734 has finished for PR 24689 at commit 59363b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented May 26, 2019

Test build #105799 has finished for PR 24689 at commit 59363b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jzhuge jzhuge force-pushed the SPARK-26946-follow branch from 59363b2 to b62182f Compare May 26, 2019 22:47
@jzhuge
Copy link
Member Author

jzhuge commented May 26, 2019

Removed commit [SPARK-26946][SQL][FOLLOWUP] Undo changes to Analyzer per review in #24686

@SparkQA
Copy link

SparkQA commented May 27, 2019

Test build #105806 has finished for PR 24689 at commit b62182f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @jzhuge, can we clarify what handling None means in PR description?

@SparkQA
Copy link

SparkQA commented May 27, 2019

Test build #105817 has finished for PR 24689 at commit d2085e5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class LookupCatalogSuite extends SparkFunSuite with Inside

@rdblue
Copy link
Contributor

rdblue commented May 27, 2019

@jzhuge, these changes look okay, but I think the right path is to always require a lookup function and get rid of the option. There is never a time when these should be used without a catalog lookup function and not requiring that one is supplied leads to avoidable errors.

I've been testing a FunctionCatalog implementation and it was easy to forget to provide a catalog lookup. But you always need a catalog lookup that does something. You may want to default in some cases to throwing NoSuchCatalogException, but that depends on the context, not whether you forgot to supply the lookup.

@jzhuge
Copy link
Member Author

jzhuge commented May 28, 2019

Well said @rdblue. I will proceed with the requirement of a lookup function. Since the interface is not widely used (yet), not too many places need update.

@jzhuge jzhuge changed the title [SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined [SPARK-26946][SQL][FOLLOWUP] Require lookup function May 28, 2019
@SparkQA
Copy link

SparkQA commented May 28, 2019

Test build #105857 has finished for PR 24689 at commit 0f4d9aa.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jzhuge
Copy link
Member Author

jzhuge commented May 28, 2019

Not sure what the following failure is in test build #105857:

MLlib regression algorithms, except for tree-based algorithms: Spark package found in SPARK_HOME: /home/jenkins/workspace/SparkPullRequestBuilder
........................................................[error] running /home/jenkins/workspace/SparkPullRequestBuilder/R/run-tests.sh ; process was 
terminated by signal 9

@SparkQA
Copy link

SparkQA commented May 28, 2019

Test build #105877 has finished for PR 24689 at commit 0f73998.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class Analyzer(

* [[UnresolvedRelation]]s into fully typed objects using information in a [[SessionCatalog]].
*/
class Analyzer(
abstract class Analyzer(
Copy link
Contributor

@rdblue rdblue May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not continue to pass a lookup function in some paths and default the analyzer to use one that throws CatalogNotFoundException? That would avoid a lot of these changes.

class Analyzer(..., catalogLookup: String => CatalogPlugin) {
  def this(...) = {
    this(..., _ => throw new CatalogNotFoundException("Analyzer does not support multiple catalogs"))
  }
}

override def lookupCatalog(name: String): CatalogPlugin = catalogLookup(name)

Copy link
Member Author

@jzhuge jzhuge May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted commit Make Analyzer abstract and add TestAnalyzer to minimize changes to test classes.

If you feel strongly, I can add the 4th parameter override val lookupCatalog: String => CatalogPlugin to Analyzer constructor. Although users can override in the following manner:

new Analyzer() {
  override def lookupCatalog(name: String): CatalogPlugin = ???
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@SparkQA
Copy link

SparkQA commented May 29, 2019

Test build #105884 has finished for PR 24689 at commit cecbea0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Analyzer(

@rdblue
Copy link
Contributor

rdblue commented May 29, 2019

Looks good to me. One minor comment: it doesn't look like Analyzer implementations created in BaseSessionStateBuilder actually override the lookup method to call the session's catalog lookup. I would probably include that in this PR, but we can also add it in later PRs when it is used.

+1, I think this is ready to merge either way.

@cloud-fan, @dongjoon-hyun, @HyukjinKwon, could you take a look at this DSv2 PR? Thanks!

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@cloud-fan cloud-fan closed this in 953b8e8 May 30, 2019
@jzhuge
Copy link
Member Author

jzhuge commented May 30, 2019

Thanks @HyukjinKwon @rdblue @cloud-fan !

mccheah pushed a commit to palantir/spark that referenced this pull request Jun 6, 2019
## What changes were proposed in this pull request?

Require the lookup function with interface LookupCatalog. Rationale is in the review comments below.

Make `Analyzer` abstract. BaseSessionStateBuilder and HiveSessionStateBuilder implements lookupCatalog with a call to SparkSession.catalog().

Existing test cases and those that don't need catalog lookup will use a newly added `TestAnalyzer` with a default lookup function that throws` CatalogNotFoundException("No catalog lookup function")`.

Rewrote the unit test for LookupCatalog to demonstrate the interface can be used anywhere, not just Analyzer.

Removed Analyzer parameter `lookupCatalog` because we can override in the following manner:
```
new Analyzer() {
  override def lookupCatalog(name: String): CatalogPlugin = ???
}
```

## How was this patch tested?

Existing unit tests.

Closes apache#24689 from jzhuge/SPARK-26946-follow.

Authored-by: John Zhuge <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants