[SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in PySpark as action for a query executor listener #21060

HyukjinKwon · 2018-04-13T03:39:37Z

What changes were proposed in this pull request?

This PR proposes to add collect to a query executor as an action.

Seems collect / collect with Arrow are not recognised via QueryExecutionListener as an action. For example, if we have a custom listener as below:

package org.apache.spark.sql

import org.apache.spark.internal.Logging
import org.apache.spark.sql.execution.QueryExecution
import org.apache.spark.sql.util.QueryExecutionListener

class TestQueryExecutionListener extends QueryExecutionListener with Logging {
  override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = {
    logError("Look at me! I'm 'onSuccess'")
  }

  override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = { }
}

and set spark.sql.queryExecutionListeners to org.apache.spark.sql.TestQueryExecutionListener

Other operations in PySpark or Scala side seems fine:

>>> sql("SELECT * FROM range(1)").show()

18/04/09 17:02:04 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess'
+---+
| id|
+---+
|  0|
+---+

scala> sql("SELECT * FROM range(1)").collect()

18/04/09 16:58:41 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess'
res1: Array[org.apache.spark.sql.Row] = Array([0])

but ..

Before

>>> sql("SELECT * FROM range(1)").collect()

[Row(id=0)]

>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> sql("SELECT * FROM range(1)").toPandas()

   id
0   0

After

>>> sql("SELECT * FROM range(1)").collect()

18/04/09 16:57:58 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess'
[Row(id=0)]

>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> sql("SELECT * FROM range(1)").toPandas()

18/04/09 17:53:26 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess'
   id
0   0

How was this patch tested?

I have manually tested as described above and unit test was added.

…uery executor listener This PR proposes to add `collect` to a query executor as an action. Seems `collect` / `collect` with Arrow are not recognised via `QueryExecutionListener` as an action. For example, if we have a custom listener as below: ```scala package org.apache.spark.sql import org.apache.spark.internal.Logging import org.apache.spark.sql.execution.QueryExecution import org.apache.spark.sql.util.QueryExecutionListener class TestQueryExecutionListener extends QueryExecutionListener with Logging { override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = { logError("Look at me! I'm 'onSuccess'") } override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = { } } ``` and set `spark.sql.queryExecutionListeners` to `org.apache.spark.sql.TestQueryExecutionListener` Other operations in PySpark or Scala side seems fine: ```python >>> sql("SELECT * FROM range(1)").show() ``` ``` 18/04/09 17:02:04 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' +---+ | id| +---+ | 0| +---+ ``` ```scala scala> sql("SELECT * FROM range(1)").collect() ``` ``` 18/04/09 16:58:41 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' res1: Array[org.apache.spark.sql.Row] = Array([0]) ``` but .. **Before** ```python >>> sql("SELECT * FROM range(1)").collect() ``` ``` [Row(id=0)] ``` ```python >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true") >>> sql("SELECT * FROM range(1)").toPandas() ``` ``` id 0 0 ``` **After** ```python >>> sql("SELECT * FROM range(1)").collect() ``` ``` 18/04/09 16:57:58 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' [Row(id=0)] ``` ```python >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true") >>> sql("SELECT * FROM range(1)").toPandas() ``` ``` 18/04/09 17:53:26 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' id 0 0 ``` I have manually tested as described above and unit test was added. Author: hyukjinkwon <[email protected]> Closes apache#21007 from HyukjinKwon/SPARK-23942. (cherry picked from commit ab7b961) Signed-off-by: hyukjinkwon <[email protected]>

HyukjinKwon · 2018-04-13T03:40:06Z

cc @BryanCutler

HyukjinKwon · 2018-04-13T03:41:24Z

python/pyspark/sql/tests.py

-        ReusedPySparkTestCase.tearDownClass()
-        cls.spark.stop()
-
-    def assertPandasEqual(self, expected, result):


This method causes a conflict which I don't really understand why. I compared line by line, character by character and they look identical.

SparkQA · 2018-04-13T07:05:02Z

Test build #89312 has finished for PR 21060 at commit 4656724.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-04-13T09:42:07Z

retest this please

SparkQA · 2018-04-13T11:32:28Z

Test build #89327 has finished for PR 21060 at commit 4656724.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

BryanCutler · 2018-04-13T16:36:38Z

retest this please

BryanCutler

LGTM, pending Jenkins

SparkQA · 2018-04-13T18:14:09Z

Test build #89352 has finished for PR 21060 at commit 4656724.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

BryanCutler · 2018-04-13T21:22:26Z

retest this please

SparkQA · 2018-04-14T00:47:04Z

Test build #89363 has finished for PR 21060 at commit 4656724.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-04-14T13:43:55Z

Merged to branch-2.3.

Thanks for reviewing this @BryanCutler.

…tion for a query executor listener ## What changes were proposed in this pull request? This PR proposes to add `collect` to a query executor as an action. Seems `collect` / `collect` with Arrow are not recognised via `QueryExecutionListener` as an action. For example, if we have a custom listener as below: ```scala package org.apache.spark.sql import org.apache.spark.internal.Logging import org.apache.spark.sql.execution.QueryExecution import org.apache.spark.sql.util.QueryExecutionListener class TestQueryExecutionListener extends QueryExecutionListener with Logging { override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = { logError("Look at me! I'm 'onSuccess'") } override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = { } } ``` and set `spark.sql.queryExecutionListeners` to `org.apache.spark.sql.TestQueryExecutionListener` Other operations in PySpark or Scala side seems fine: ```python >>> sql("SELECT * FROM range(1)").show() ``` ``` 18/04/09 17:02:04 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' +---+ | id| +---+ | 0| +---+ ``` ```scala scala> sql("SELECT * FROM range(1)").collect() ``` ``` 18/04/09 16:58:41 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' res1: Array[org.apache.spark.sql.Row] = Array([0]) ``` but .. **Before** ```python >>> sql("SELECT * FROM range(1)").collect() ``` ``` [Row(id=0)] ``` ```python >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true") >>> sql("SELECT * FROM range(1)").toPandas() ``` ``` id 0 0 ``` **After** ```python >>> sql("SELECT * FROM range(1)").collect() ``` ``` 18/04/09 16:57:58 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' [Row(id=0)] ``` ```python >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true") >>> sql("SELECT * FROM range(1)").toPandas() ``` ``` 18/04/09 17:53:26 ERROR TestQueryExecutionListener: Look at me! I'm 'onSuccess' id 0 0 ``` ## How was this patch tested? I have manually tested as described above and unit test was added. Author: hyukjinkwon <[email protected]> Closes #21060 from HyukjinKwon/PR_TOOL_PICK_PR_21007_BRANCH-2.3.

gatorsmile · 2018-04-15T03:43:16Z

Since this is not a bug fix, I plan to revert this PR. WDYT? @HyukjinKwon @BryanCutler

HyukjinKwon · 2018-04-15T04:29:28Z

hm I would say it's a bug since the action is not detected which is supposed to call the callback. The test is a bit complicated but the fix is relatively straightforward.

gatorsmile · 2018-04-15T04:32:29Z

This will introduce the behavior change and it is not a regression. The changes we made in this PR could break the external app. We should not do it in the maintenance release.

HyukjinKwon · 2018-04-15T04:36:01Z

I guess the behaviour changes here is that a custom query execution listener now can recognise the action collect in PySpark which other APIs have detected. Mind explaining how it breaks external apps? If the callback should not be called specifically collect but not other actions like show in PySpark, I would say it should be to blame users apps.

gatorsmile · 2018-04-15T04:41:15Z

Users apps should not be blamed in this case. If they want this change, they should upgrade to the newer release. Basically, we should not introduce any external behavior change in the maintenance release if possible.

HyukjinKwon · 2018-04-15T04:51:19Z

I am a bit puzzled because QueryExecutionListener should call the callback for actions and collect triggers it in Scala and R but it doesn't in PySpark specifically. It sounds a bug and this fix is relatively straightforward. The previous behaviour was it was not being called which didn't make sense.

I agree that it's discouraged to make a behaviour change to the maintenance release, sure. However, I was thinking it makes sense to backport if the fix is not complicated and looks a bug quite clearly. I think we shouldn't say it's improvement in this case.

Were actual apps or test cases broken somewhere?

gatorsmile · 2018-04-15T04:53:20Z

This is just the basic backport rule we follow for each PR. We should not make an exception for this PR.

HyukjinKwon · 2018-04-15T05:01:02Z

I agree that It's better to avoid a behaviour change but this one is a clearly a bug and the fix is straightforward. I am puzzled why this specifically prompted you. I wouldn't revert if there's not specific worry about this patch.

gatorsmile · 2018-04-15T05:08:23Z

If this can be treated as a bug to backport, we have many behavior change PRs that can be backported. We are building the system software. We have to be more principled.

HyukjinKwon · 2018-04-15T05:17:57Z

How about we formally document that in the guide?

I have been always putting more importance on practice and I personally think we are fine to make a backport if it's a bug and the fix is straightforward. IMHO, principal is a base but we should put more importance on practice.

Even if I take your words, I would then like to make this as an exception since this fixes actual usecases from our customers.

gatorsmile · 2018-04-15T05:22:03Z

I do think we should clearly document the rule what we can backport.

I do not think we should make an exception for this PR. cc @mateiz @rxin @marmbrus @yhuai @cloud-fan @ueshin

HyukjinKwon · 2018-04-15T05:23:12Z

Yup, that should reduce some overhead like this. I would like to listen what you guys think cc @srowen, @vanzin, @felixcheung, @holdenk too.

srowen · 2018-04-15T14:45:02Z

This certainly looks like a bug fix. I don't know this area well, but I don't see an argument here that the current behavior is correct. Right?

When we say we don't back-port behavior changes, we mean "changes in what is meant to be correct behavior". All bug fixes change behavior, but to restore correct behavior. So I don't see an argument against back-porting because it's a behavior change.

Of course, sometimes practical concerns override that. If we thought programs were relying on the 'wrong' behavior then we'd have to think twice about correcting it. I don't see that argument being made here, but, I'm not sure? There is evidence the 'wrong' behavior is impacting users though?

@gatorsmile I must say I don't understand your position here, can you clarify? So far standard practice here says this is a reasonable backport. What's different here?

gatorsmile · 2018-04-15T15:22:25Z

withCallback was added in Spark 1.6 release https://issues.apache.org/jira/browse/SPARK-11068 Since then, my understanding is we never clearly define which should be part of withCallback. Thus, it is hard to say this is a bug fix.

We hit the similar issue in #18064. At that time, we did not backport the PR to the previous releases too. Thus, I do not think we should make an exception for this PR just because the customers of @HyukjinKwon hit this issue. If we make an exception, it becomes harder to decide which PRs are qualified for a backport.

We need to be very careful when backporting the PR with the behavior changes, especially when this is neither a critical issue nor a regression. Thus, I do not think we should backport this PR.

HyukjinKwon · 2018-04-15T15:34:16Z

withCallback was added in Spark 1.6 release https://issues.apache.org/jira/browse/SPARK-11068 Since then, my understanding is we never clearly define which should be part of withCallback. Thus, it is hard to say this is a bug fix.

The callback works for collect in R and Scala but Python doesn't. I think we should at least match the behaviour. I wonder why it's hard to say a bug when collect is detected in some langauges's API but not in some APIs.

We hit the similar issue in #18064. At that time, we did not backport the PR to the previous releases too.

That's because the change was big and invasive. I wouldn't backport it too; however, this fix is relatively small.

Thus, I do not think we should make an exception for this PR just because the customers of @HyukjinKwon hit this issue

It's not because my customers but I am saying it fixes an actual usecase and it affects actual users.

If we make an exception, it becomes harder to decide which PRs are qualified for a backport.

I think we usually use committer's judgement when we make an exception. I already have been seeing many backports that actually causes behaviour changes and I did this because it looks being backported in general. This is the reason why we should formally document it if this is actually the rule.

What I am less sure is, why this one specifically prompted you.

HyukjinKwon · 2018-04-15T15:37:31Z

We need to be very careful when backporting the PR with the behavior changes, especially when this is neither a critical issue nor a regression. Thus, I do not think we should backport this PR.

I am not saying we shouldn't be careful but affects actual user group and actual scenarios.

gatorsmile · 2018-04-15T15:41:52Z

The callback works for collect in R and Scala but Python doesn't. I think we should at least match the behaviour. I wonder why it's hard to say a bug when collect is detected in some APIs but not in some APIs.

The behavior inconsistency among Python/Scala/R/JAVA does not mean a bug, right?

That's because the change was big and invasive. I wouldn't backport it too; however, this fix is relatively small.

Too big and invasive is not the reason why we did not backport that PR. We still can backport the minimal changes to the previous releases.

I think we usually use committer's judgement when we make an exception. I already have been seeing many backports that actually causes behaviour changes and I did this because it looks being backported in general. This is the reason why we should formally document it if this is actually the rule.

I am not against this specific PR. All the committers need to be really careful when they make a decision to backport a behavior change. If any committer does it, we should jump in and stop the backport. This is what we should do.

HyukjinKwon · 2018-04-15T15:52:28Z

The behavior consistency among Python/Scala/R/JAVA does not mean a bug, right?

This case specifically collect in PySpark doesn't work alone whereas all other actions like foreach, show and other cases in other languages works in all other APIs. Also, that's what a query execution listener describes. Do you believe you would make this exception for PySpark specifically in any case?

I am seeing collect is included in the original commit - 15ff85b

I am not against this specific PR. All the committers need to be really careful when they make a decision to backport a behavior change. If any committer does it, we should jump in and stop the backport. This is what we should do.

Let's open a discussion in the mailing list and see if we can see the agreement. I think this was not the first time we talked about this and think it's better to open a proper discussion and make a decision - so you basically mean any behaviour changes shouldn't be backported?

gatorsmile · 2018-04-15T15:59:03Z

This case specifically collect in PySpark doesn't work alone whereas all other actions like foreach, show and other cases in other languages works in all other APIs. Also, that's what a query execution listener describes. Do you believe you would make this exception for PySpark specifically in any case?

To improve the usability, we should change it in the master branch. My point is we should not backport this PR to 2.3 release.

Let's open a discussion in the mailing list and see if we can see the agreement. I think this was not the first time we talked about this and think it's better to open a proper discussion and make a decision.

Sure, let me lead the discussion in the dev channel and welcome you to add the inputs there. Next, we should also discuss the rule which PRs can be backported to RC branches when we do the release. In Spark 2.3 release, we backported many PRs that should not be merged to the release candidate branches.

HyukjinKwon · 2018-04-15T16:04:40Z

This is not an new feature addition .. this fixes an exiting functionality to work as expected and consistently ..
Sure, that'd be great. Will join in the discussion.

gatorsmile · 2018-04-15T16:30:44Z

Fixing API inconsistency should not be treated as a bug fix.

Please give me a few days. I need to summarize the Spark 2.3 release and list all the PRs that were backported to the release candidate branches. Thanks!

HyukjinKwon · 2018-04-15T23:55:51Z

This is not just about just inconsistency but a bug. The previous behaivour doesn't make sense and I can't imagine the way it breaks external apps in principle. Also, it fixes actual use cases.

Sure, no need to rush.

gatorsmile · 2018-04-16T00:58:11Z

Like what I said above, we need to be very careful when backporting the PR with the behavior changes, especially when this is neither a critical issue nor a regression. Even if this is a bug based on your understanding, we should still not backport such PRs.

HyukjinKwon · 2018-04-16T01:19:49Z

I am not saying we shouldn't be careful. I am trying to be careful when I backport. So, your reasons are:

any behaviour changes shouldn't be backported and it's the basic backport rule

I disagree unless it's clearly documented as a rule. Even if so, I would like to make this as an exception because it's less invasive, looks a bug, affects an actual user group and fixes the case to make it sense. That's what I have been used to so far.
the query execution listener is not clearly defined

I am seeing collect is included in the original commit - 15ff85b. I don't see a reason to specifically exclude PySpark's case since Scala and R also work. I don't think we would exclude this on purpose.
It's not a critical issue nor a regression.

I don't think we should only make a backport for a critical issue or a regression. That's a strong reason to backport but there are still other cases that can be backported based on my understanding and observations. If it's a bug quite clearly and it affects an actual user group, I would guess it can be valuable for a backport. The fix is straightforward, less invasive and small.

HyukjinKwon · 2018-04-16T01:23:49Z

cc @rdblue and @steveloughran too who I guess should be interested in setting up a backporting policy.

steveloughran · 2018-04-16T09:39:58Z

This is one of those great problems in software engineering: no good answer. I think case-by-case is generally the best tactic, with a bias against feature backport, though my track record is a bit mixed.

Patches which fix security issues at the expense of compatibility are real problems here: they need to go in even knowing stuff will break —especially when you quietly push it out with an innocuous JIRA title until you actually do the releases. People start complaining that XML entity expansion has has stopped working, REST APIs failing if unauthed, when that is the exact outcome intended,

Talk to @templedf for a good policy here

BryanCutler · 2018-04-16T16:44:54Z

This was a bug fix from my perspective and looked to be low risk. I don't think this changes any behavior for the user, except if you do a collect from pyspark and have a QueryExecutionListener, then it will now get the expected callback instead of nothing.

rdblue · 2018-04-16T18:09:54Z

I agree with what @srowen said:

When we say we don't back-port behavior changes, we mean "changes in what is meant to be correct behavior". All bug fixes change behavior, but to restore correct behavior. So I don't see an argument against back-porting because it's a behavior change.

I also think that this is definitely a bug fix and that it is worth backporting to 2.3.

@gatorsmile: it's a stretch to say that this isn't a bug fix because it isn't a regression or isn't critical. I would very reasonably expect this behavior, especially because it is the current behavior in Scala.

rxin · 2018-04-16T20:05:15Z

It looks to me this is a bug fix that can merit backporting, as QueryExecutionListener is also marked as experimental,

In this case, I think @gatorsmile worrys one might have written a listener that enumerates the possible function names, and that listener will fail now with a new action name. I feel this is quite unlikely, but I also appreciate @gatorsmile's concern for backward compatibility, and I've certainly been wrong before when our fixes break existing workloads.

(On the spectrum of being extremely conservative to extremely liberal w.r.t. backward compatibility, I think I'm in general more on the middle, whereas @gatorsmile probably leans more to the conservative side. There isn't really anything wrong with this, and it's good to have balancing forces in a project.)

How about this, @HyukjinKwon -- for the 2.3.x backport, add a config that so it is possible to turn this off in production, if somebody actually has their job failed because of this? It's a small delta from what this PR already does, and that should alleviate the concerns @gatorsmile has. I'd also change the function doc for onSuccess/onFailure to make it clear that we will add new function names in the future, and users shouldn't expect a fixed list of function names.

HyukjinKwon · 2018-04-17T00:34:40Z

I am okay if there's a specific reason. I think this is the point - if there's a specific reason, that should be mentioned and explained ahead. Actually, I (and @srowen did as well IIUC) asked this many times, see above.

I would have investigated or would have just said that I am okay with reverting. I don't usually get in the way if there's a specific reason. It would be great if we can have more open talks next time.

for the 2.3.x backport, add a config that so it is possible to turn this off in production, if somebody actually has their job failed because of this? It's a small delta from what this PR already does, and that should alleviate the concerns @gatorsmile has.

I am personally fine with reverting or adding a configuration if that's what you guys feel strongly; however, I should say it sounds unusual to have a config to control this behaviour in branch-2.3 alone and it sounds less worth. The case you mention sounds really unlikely and I wonder if that makes sense tho. It's also experimental as you all said.

Also, I should note that I have been confused about the backporting policy and the bunch of configurations to control each behaviour. If that's just concerns to be addressed, that's fine but sounds what people must follow so far. If this is true, I feel sure this should be documented and we shouldn't have such overhead next time. I am pretty sure this isn't the first time.

srowen · 2018-04-17T01:14:24Z

Adding a flag just in 2.3 is, at least, an unusual thing to do. By this logic lots of backports should be flag protected but we don't. Why is this special?

I still don't see much argument against this backport. I count about 3-4 committers in favor and 1 against. Let's leave it.

gatorsmile · 2018-04-17T03:20:37Z

I might not explain it well. Sorry for the misunderstanding. Thank you @rxin for helping me clarify my points. It sounds like many of you think this backport is fine. I am not against this specific PR. We do not need to revert the PR but just improve the documentation. That should be fine, although I still personally prefer to adding the configuration.

As what I said in the original PR #21007 that was merged to master, let me point out two points here too.

PR descriptions will be part of the commit log. We need to be very careful before merging the PR. In the past, I also missed a few when I did the merge. To be honest, I am not sure how the native English speakers think. The first paragraph scared me when I reading the PR commit log. @srowen WDYT?

This PR proposes to add collect to a query executor as an action.

Document the behavior changes that are visible to the external users/developers. In Spark 2.3, we started to enforce it in every merged PR. I believe many of you got multiple similar comments in the previous PRs. This PR should also upgrade the migration guides. @HyukjinKwon Do you agree?

Before we finalize the backport policy, below is my inputs about the whitelist which we can backport:

The critical/important bug fixes and security fixes.
The regression fixes.
The PRs that do not touch the production code, like test-only patches, documentation fixes, and the log message fixes.

Avoid backporting the PRs if it contains

The new features
The minor bug fixes/improvements that have external behavior changes
The code refactoring
The code changes with the high/mid risk

In the OSS community, I believe no committer will be fired just because we merged/introduced a bug, right? If the users application failed due to an upgrade, normally we blame our users or the bug are just accidentally introduced. However, this is not acceptable in my first team. Let me share what I experienced. Just various customer accidents in my related product teams.

One director got demoted (almost fired) due to a bad release. She is a very nice lady. We really like her. That release had many cool features but the quality is not controlled well. Many customers are not willing to upgrade.
There is a famous system upgrade failure a few years ago. The whole system became very slow after the upgrade. It took 10s hours to recover the system. After a few days, the GM went to the customer site and got blamed in the whole day. Multiple architects and VPs were forced to write apology letters. Customers planned to sue us. In the customer side, the CTO got fired later and the upgrade accident was also on the national TV news because it affects many people.
A few directors were on call with me 10+ nights to resolve one Japanese customer data corruption issue. The client teams ran multiple systems at the same time to reproduce the issue. After a few weeks, it was finally resolved after reading the memory dump. The root cause is the code merge from one branch to another branch many years ago.

If all the above people believes Spark is the best product in Big Data, we need to be more conservative. Our decisions could affect many people. This is not the first time I argued with the other committers/contributors about the PR quality. In one previous PR, I left almost 100 comments just because the documents are not accurate.

If my above comments offend anyone, I apologize. Everyone has different understanding about the software development because we have different work experience. The whole community already did a wonderful job compared with the other open source projects. I still believe we can do a better job, right? Let us formalize the backport policy and enforce them in each release.

srowen · 2018-04-17T03:41:34Z

I do not see a problem with the commit message here. Is that really the issue? it accurately describes what changes. The why has always been documented in discussion, and it is here already. Sometimes the why is documented in comments too; I don't see a particular need for that here, but, if that's the issue, why isn't that what we're talking about?

You continue to portray this as a behavior change, and I think you mean "a change in what is considered correct behavior". However all the other comments suggest otherwise; the argument from consistency seems much stronger.

Your proposed criteria for backports sort of align with accepted practice, which is to follow semver.org semantics. I think semver is reasonably clear, in general and in this case. I see broad agreement for this backport, and people simply disagree with your interpretation. It is not a failure to understand criteria.

Believe me, people here have plenty experience with software, versioning, and the impact of changes. I'd put more faith in the judgment of your peers. Your anecdotes are of a type that's familiar to many people, but, I also fail to see how they're relevant here.

You are adopting a 'conservative' position and I think in this case it's out of line with normal practice. I think you should accept that people disagree and move on.

gatorsmile · 2018-04-17T04:27:43Z

I am fine to accept different opinions for this specific PR. Reverting this PR is not my goal here. This is a public community. It sounds like the commit message clearly delivers what this PR does to you: This PR proposes to add collect to a query executor as an action., although I still have different opinions. We need to collect and accept different opinions always.

I am also glad you agree on the backport policy I proposed above. Hopefully, everyone is on the same page for avoiding unnecessary overhead.

The minor bug fixes/improvements that have external behavior changes

I personally thought this PR fits this category. No matter whether the behavior changes are correct or not, we should still not backport it if the issue is neither critical nor a regression. That is what I emphasized in the above argument multiple times. The API inconsistency is not rare between our APIs. We did not backport these PRs. Now, I am fine to backport it because it is an experimental API. Thus, we can say we do not guarantee the backport compatibility. If it were a public API, I would insist my original opinion.

I am also glad many community members have a lot of experience with mission critical software development. This can help improve documentation, code quality and test coverage. Development of application/mobile software is completely different from development of system software. We are in the right direction. We need to enforce it with stricter discipline.

steveloughran · 2018-04-18T14:11:42Z

from the ASF process-police perspective, something like versioning/backport policy is something which should be done on the ASF dev list...consider asking in user@ to see what people's preferences are. Worthwhile mentioning in the project report too.
from a personal perspective: its good to have a policy, but really good to leave a little bit of wiggle-room, even if its something like 'a vote on the developer list can override any policy on a case-by-case basis". That is: you can do more than just fixes, but its something where the decision is opened up. This makes clear the cost and avoids the "why did you cherry-pick this without asking me" conversations.

gatorsmile · 2018-04-18T16:26:02Z

@steveloughran Agree. We always can make an exception if most need it. Normally, in these cases, we should make it configurable. That means, users can turn them off by changing the SQLConf.

For any external behavior change (even in experimental APIs), we need to document it and also mention it in the release note. This can simplify the version upgrade and let our users trust us.
Need to have more discussion regarding the backport policy (to the maintenance branches and release candidate branches). This should be discussed and finalized at least in the dev list. When the policy is completed, we should also send it to the user list and the committers are responsible for enforcing it.

I am trying to summarize what we did in the Spark 2.3 release. It took almost 2 month to release it. Will send the postmortem to the community with some proposal about the backport policy.

HyukjinKwon commented Apr 13, 2018

View reviewed changes

BryanCutler approved these changes Apr 13, 2018

View reviewed changes

HyukjinKwon closed this Apr 14, 2018

HyukjinKwon deleted the PR_TOOL_PICK_PR_21007_BRANCH-2.3 branch October 16, 2018 12:44

[SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in PySpark as action for a query executor listener #21060

[SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in PySpark as action for a query executor listener #21060

Uh oh!

Conversation

HyukjinKwon commented Apr 13, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Apr 13, 2018

Uh oh!

HyukjinKwon Apr 13, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 13, 2018

Uh oh!

HyukjinKwon commented Apr 13, 2018

Uh oh!

SparkQA commented Apr 13, 2018

Uh oh!

BryanCutler commented Apr 13, 2018

Uh oh!

BryanCutler left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 13, 2018

Uh oh!

BryanCutler commented Apr 13, 2018

Uh oh!

SparkQA commented Apr 14, 2018

Uh oh!

HyukjinKwon commented Apr 14, 2018

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018

Uh oh!

gatorsmile commented Apr 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Apr 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Apr 15, 2018

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Apr 15, 2018

Uh oh!

gatorsmile commented Apr 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Apr 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018

Uh oh!

gatorsmile commented Apr 15, 2018

Uh oh!

HyukjinKwon commented Apr 15, 2018 •

edited

Loading

gatorsmile commented Apr 15, 2018 •

edited

Loading

HyukjinKwon commented Apr 15, 2018 •

edited

Loading

HyukjinKwon commented Apr 15, 2018 •

edited

Loading

gatorsmile commented Apr 15, 2018 •

edited

Loading

HyukjinKwon commented Apr 15, 2018 •

edited

Loading

HyukjinKwon commented Apr 15, 2018 •

edited

Loading

rdblue commented Apr 16, 2018 •

edited

Loading

rxin commented Apr 16, 2018 •

edited

Loading

HyukjinKwon commented Apr 17, 2018 •

edited

Loading

gatorsmile commented Apr 18, 2018 •

edited

Loading