-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-3580][CORE] Add Consistent Method To Get Number of RDD Partitions Across Different Languages #9767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is such a noisy change for really little gain that I don't think it's worth it. At best just ensure there is a getNumPartitions in each language. |
|
Although it would be nice to fix the inconsistencies in the usage of partitions.length vs partitions.size, I agree that its better to stick to just adding getNumPartitions. I will remove the second commit from the PR. Should we also add a getNumPartitions method to JavaRDDLike for the Java API? |
|
Yes, ideally. @JoshRosen am I right that we have to add the new method to the |
|
If I understand the discussion in SPARK-3266 correctly the method should only be added to the JavaRRDLike trait and not the abstract class. I have updated the PR with this change. |
|
Yeah, AFAIK you only need to add it to the trait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding a @since tag here to say that this is new in Spark 1.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. Should I use @since or @Since, and also add it to RDD.scala? (I could not find any occurrence of this tag in spark-core).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should use Spark's own @Since tag, since it's used in MLLib and SQL: https://github.com/apache/spark/blob/31921e0f0bd559d042148d1ea32f865fb3068f38/core/src/main/scala/org/apache/spark/annotation/Since.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added the @Since tags and updated this PR.
|
I think this looks good. |
|
Test build #2094 has finished for PR 9767 at commit
|
|
@schot I think you will have to add the MiMa exclude that the error message in the output mentions. It's a 'false positive' but MiMa needs to be reassured. |
|
@srowen Yes, I added a Mima exclude. |
|
Because I merged the fix in the previous commit Jenkins retest does not happen automatically? |
|
Test build #2111 has finished for PR 9767 at commit
|
|
@schot this'll need a rebase. @JoshRosen are you OK with this for 1.6? |
This patch adds a new method getNumPartitions to the Scala RDD and JavaRDDLike APIs as proposed in [SPARK-3580]. It brings the Scala and Java APIs in line with the Python API. For the Java API we added a Mima exclude.
|
@srowen @JoshRosen PR has been rebased to resolve the conflict on MimaExcludes. |
|
Pinging @JoshRosen @pwendell for an opinion on slipping this into 1.6. I'm still inclined to put it in but I'm aware this week there's a more conservative stance on 1.6. I wanted to wait a day before I did this. |
|
Yeah I think it's fine to pull in - but do it quickly because an RC will go out very soon! |
…ons Across Different Languages I have tried to address all the comments in pull request #2447. Note that the second commit (using the new method in all internal code of all components) is quite intrusive and could be omitted. Author: Jeroen Schot <[email protected]> Closes #9767 from schot/master. (cherry picked from commit 128c290) Signed-off-by: Sean Owen <[email protected]>
|
merged to master / 1.6 |
I have tried to address all the comments in pull request #2447.
Note that the second commit (using the new method in all internal code of all components) is quite intrusive and could be omitted.