Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Previously TRUNCATE TABLE ... PARTITION will always truncate the whole table for data source tables, this PR fixes it and improve InMemoryCatalog to make this command work with it.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @ericl @yhuai @gatorsmile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind changing rectangles2 to rectPartitioned or similar? Think the test would get simpler to read (later).

@SparkQA
Copy link

SparkQA commented Oct 30, 2016

Test build #67788 has finished for PR 15688 at commit a9eec51.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ericl
Copy link
Contributor

ericl commented Oct 30, 2016

This lgtm

@gatorsmile
Copy link
Member

If we try to truncate a partition that does not exist, we just do nothing?

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67847 has finished for PR 15688 at commit a9eec51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

@gatorsmile that's a good point. I checked with hive, the behaviour is:

  1. if the given partition spec specifies all partition columns, throw exception if the partition not exist
  2. if the given partition spec is partial, do nothing if the partition not exist

I have updated this PR according to this

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67871 has finished for PR 15688 at commit 9045ccd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67882 has finished for PR 15688 at commit 9045ccd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67887 has finished for PR 15688 at commit 9045ccd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does partial mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should document it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the specified column names are not partitioning columns, we will get a strange error:

key not found: height
java.util.NoSuchElementException: key not found: height

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will/should hit this in the ExternalCatalog level, as most of partition related methods assumes the column names in partition spec are partition columns, e.g. loadPartition, loadDynamicPartitions, dropPartitions, etc.

If we do hit this, we should fix it at the caller side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above message was got from the statement

sql("TRUNCATE TABLE partTable PARTITION (height =1)")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing the size is not enough. We also need to check whether all the columns are partitioning columns.

@gatorsmile
Copy link
Member

gatorsmile commented Nov 1, 2016

Could we add all the positive and negative cases in ExternalCatalogSuite, like the other APIs? Then, we can easily know whether the behaviors of InMemoryCatalog and HiveExternalCatalog are consistent.

@SparkQA
Copy link

SparkQA commented Nov 5, 2016

Test build #68188 has finished for PR 15688 at commit 8941e10.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 5, 2016

Test build #68195 has finished for PR 15688 at commit 8941e10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// do nothing if no partition is matched for the given non-partial partition spec
// TODO: This behaviour is different from Hive, we should decide whether we need to follow
// Hive's behaviour or stick with our existing behaviour later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's our and hive's behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our behaviour: do nothing if no partition is matched, no matter whether the given partition spec is partial or not.

Hive's behaviour: when no partition is matched, do nothing if the given partition spec is partial, throw exception if the given partition spec is non-partial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we document the behavior difference in the code comments, if we decide to stick with our existing behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think Hive's behavior makes sense here. If I'm giving you an exact match, you should warn me if there is an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative, of course, is to provide "truncate if exists", which doesn't throw exceptions if that's desired.

@rxin
Copy link
Contributor

rxin commented Nov 5, 2016

cc @gatorsmile

@gatorsmile
Copy link
Member

Regarding the current code changes, LGTM

Like what @cloud-fan said in the code comment, we need to decide whether we should follow Hive.

@SparkQA
Copy link

SparkQA commented Nov 6, 2016

Test build #68241 has finished for PR 15688 at commit 4726228.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 7, 2016

Thanks - merging in master/branch-2.1.

@rxin
Copy link
Contributor

rxin commented Nov 7, 2016

@cloud-fan can you create a follow-up pr to switch over to Hive's behavior?

asfgit pushed a commit that referenced this pull request Nov 7, 2016
…tion

## What changes were proposed in this pull request?

Previously `TRUNCATE TABLE ... PARTITION` will always truncate the whole table for data source tables, this PR fixes it and improve `InMemoryCatalog` to make this command work with it.
## How was this patch tested?

existing tests

Author: Wenchen Fan <[email protected]>

Closes #15688 from cloud-fan/truncate.

(cherry picked from commit 46b2e49)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 46b2e49 Nov 7, 2016
asfgit pushed a commit that referenced this pull request Nov 8, 2016
…hed for the given non-partial partition spec

## What changes were proposed in this pull request?

a follow up of #15688

## How was this patch tested?

updated test in `DDLSuite`

Author: Wenchen Fan <[email protected]>

Closes #15805 from cloud-fan/truncate.

(cherry picked from commit 73feaa3)
Signed-off-by: Wenchen Fan <[email protected]>
ghost pushed a commit to dbtsai/spark that referenced this pull request Nov 8, 2016
…hed for the given non-partial partition spec

## What changes were proposed in this pull request?

a follow up of apache#15688

## How was this patch tested?

updated test in `DDLSuite`

Author: Wenchen Fan <[email protected]>

Closes apache#15805 from cloud-fan/truncate.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…tion

## What changes were proposed in this pull request?

Previously `TRUNCATE TABLE ... PARTITION` will always truncate the whole table for data source tables, this PR fixes it and improve `InMemoryCatalog` to make this command work with it.
## How was this patch tested?

existing tests

Author: Wenchen Fan <[email protected]>

Closes apache#15688 from cloud-fan/truncate.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…hed for the given non-partial partition spec

## What changes were proposed in this pull request?

a follow up of apache#15688

## How was this patch tested?

updated test in `DDLSuite`

Author: Wenchen Fan <[email protected]>

Closes apache#15805 from cloud-fan/truncate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants