Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented May 17, 2016

What changes were proposed in this pull request?

refreshTable was a method in HiveContext. It was deleted accidentally while we were migrating the APIs. This PR is to add it back to HiveContext.

In addition, in SparkSession, we put it under the catalog namespace (SparkSession.catalog.refreshTable).

How was this patch tested?

Changed the existing test cases to use the function refreshTable. Also added a test case for refreshTable in hivecontext-compatibility

*/
def refreshTable(tableName: String): Unit = {
sparkSession.catalog.refreshTable(tableName)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method did not exist in SQLContext. It's in HiveContext.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will remove it. Thanks!

@SparkQA
Copy link

SparkQA commented May 17, 2016

Test build #58725 has finished for PR 13156 at commit 9e6c4b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.saveAsTable("arrayInParquet")

sessionState.refreshTable("arrayInParquet")
sparkSession.catalog.refreshTable("arrayInParquet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we don't call refreshTable through SessionState, do we still need to keep SessionState.refreshTable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I also want to remove invalidateTable, which is a duplicate name of refreshTable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, invalidateTable and refreshTable do have different meanings. The current implementation of HiveMetastoreCatalog.refreshTable is HiveMetastoreCatalog.invalidateTable (and then we retrieve the new metadata lazily). But, it does not mean that refreshTable and invalidateTable have the same semantic. If we should remove any of invalidateTable or refreshTable should be discussed in a different thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58740 has finished for PR 13156 at commit 4ac3b76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sparkSession.cacheManager.tryUncacheQuery(df, blocking = true)
// Cache it again.
sparkSession.cacheManager.cacheQuery(df, Some(tableIdent.table))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above logics are moved to sparkSession.catalog.refreshTable

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58805 has finished for PR 13156 at commit 7142ef5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58808 has finished for PR 13156 at commit 8a52ac6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val _hc = hc
import _hc.implicits._

withTempPath { tempDir =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me remove it. : )

@cloud-fan
Copy link
Contributor

LGTM except the test stuff, thanks for working on it! I agree that we should remove refreshTable in SessionState, but need someone to confirm, or we can do it in follow-up PR.

@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58831 has finished for PR 13156 at commit 2b773b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class HiveContextCompatibilitySuite extends SparkFunSuite with BeforeAndAfterEach

*
* @since 1.3.0
*/
def refreshTable(tableName: String): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if invalidateTable has different meaning than refreshTable, should we also add it to HiveContext? cc @yhuai

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is for the compatibility purpose. Let's leave it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@andrewor14
Copy link
Contributor

LGTM other than the renaming. We shouldn't have spark.catalog.refreshTable and spark.sessionState.refreshTable do different things. I would rename the latter to invalidateTable since that's what it's really doing.

@gatorsmile
Copy link
Member Author

I see. spark.sessionState.invalidateTable already exists. They have the same implementation. Thus, I will just remove spark.sessionState.refreshTable? Let me do it

@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58905 has finished for PR 13156 at commit 20d5055.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58934 has finished for PR 13156 at commit 20d5055.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58940 has finished for PR 13156 at commit 20d5055.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 20, 2016
#### What changes were proposed in this pull request?
`refreshTable` was a method in `HiveContext`. It was deleted accidentally while we were migrating the APIs. This PR is to add it back to `HiveContext`.

In addition, in `SparkSession`, we put it under the catalog namespace (`SparkSession.catalog.refreshTable`).

#### How was this patch tested?
Changed the existing test cases to use the function `refreshTable`. Also added a test case for refreshTable in `hivecontext-compatibility`

Author: gatorsmile <[email protected]>

Closes #13156 from gatorsmile/refreshTable.

(cherry picked from commit 39fd469)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to master and 2.0!

@asfgit asfgit closed this in 39fd469 May 20, 2016
@gatorsmile
Copy link
Member Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants