Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Feb 27, 2015

The eq of DataType is not correct, class cache is not use correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released.

Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython).

This PR also improve the performance of inferSchema (avoid the unnecessary converter of object).

@davies davies changed the title [SPARK-6055] [PySpark] fix incorrect DataType.__eq__ [SPARK-6055] [PySpark] fix incorrect DataType.__eq__ (for 1.1/1.2) Feb 27, 2015
@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28054 has started for PR 4809 at commit 6c2909a.

  • This patch merges cleanly.

@davies davies changed the title [SPARK-6055] [PySpark] fix incorrect DataType.__eq__ (for 1.1/1.2) [SPARK-6055] [PySpark] fix incorrect DataType.__eq__ (for 1.2) Feb 27, 2015
@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28054 has finished for PR 4809 at commit 6c2909a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28054/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28074 has started for PR 4809 at commit b576107.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28074 has finished for PR 4809 at commit b576107.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28074/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28086 has started for PR 4809 at commit 9b4dadc.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28093 has started for PR 4809 at commit 65c222f.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the PR opened against master added typechecking asserts here. Should we also add them in this branch, or is there a reason why we should omit them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can not cherry-pick the patch from master, I need to re-do all the things on 1.2/1.1, so I'd like to keep the changes minimized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough; just wanted to check.

@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28086 has finished for PR 4809 at commit 9b4dadc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28086/
Test FAILed.

@JoshRosen
Copy link
Contributor

LGTM, since the changes here are a subset of the changes in the PR opened against master.

@SparkQA
Copy link

SparkQA commented Feb 27, 2015

Test build #28093 has finished for PR 4809 at commit 65c222f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28093/
Test PASSed.

@JoshRosen
Copy link
Contributor

I've merged this into branch-1.2 (1.2.2). Thanks!

asfgit pushed a commit that referenced this pull request Feb 28, 2015
The eq of DataType is not correct, class cache is not use correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released.

Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython).

This PR also improve the performance of inferSchema (avoid the unnecessary converter of object).

Author: Davies Liu <[email protected]>

Closes #4809 from davies/leak2 and squashes the following commits:

65c222f [Davies Liu] Update sql.py
9b4dadc [Davies Liu] fix __eq__ of singleton
b576107 [Davies Liu] fix tests
6c2909a [Davies Liu] fix incorrect DataType.__eq__
@pwendell
Copy link
Contributor

pwendell commented Mar 1, 2015

@davies can you close this? (auto close doesn't work for the backport commits).

@davies davies closed this Mar 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants