Skip to content

Conversation

@nchammas
Copy link
Contributor

@nchammas nchammas commented Mar 16, 2020

What changes were proposed in this pull request?

This PR (SPARK-31167) refactors how we track various Python test and build dependencies so they are pinned and internally consistent.

Why are the changes needed?

This should make it easier to bump dependencies (since they are specified in fewer places) and promote more consistent build behavior across Docker, Jenkins, GitHub, and local development environments.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I built the Docker image just far enough to confirm that the dependencies are installed correctly:

Step 16/31 : RUN pip install -r docs-requirements.txt
 ---> Running in b0c66cb2a543
Collecting sphinx==2.3.1
  Downloading Sphinx-2.3.1-py3-none-any.whl (2.7 MB)
Collecting mkdocs==1.0.4
  Downloading mkdocs-1.0.4-py2.py3-none-any.whl (1.2 MB)
Collecting numpy==1.18.1
  Downloading numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
...

The GitHub workflow will be tested as part of this PR.

@nchammas
Copy link
Contributor Author

cc @HyukjinKwon. I moved the requirements changes from #27912 to this PR.

@SparkQA
Copy link

SparkQA commented Mar 16, 2020

Test build #119884 has finished for PR 27928 at commit 9c3a7af.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119896 has finished for PR 27928 at commit 9c3a7af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Unidecode==0.04.19
sphinx
sphinx==2.3.1
numpy==1.18.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly googled and skimmed other projects about the requirements here and here. I still think it's more usual to specify the range, rather than the specific version.

I am still not sure if it's right to pin the version yet. There's trade-off on pinning. I think Spark community is big enough to handle these issue from using the latest versions too. I would only pin the version when an issue difficult to fix is found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc somewhat arbitrary committers who might be interested in here: @srowen, @holdenk, @dongjoon-hyun, @BryanCutler. WDYT about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the trade-off mentioned by @HyukjinKwon . Actually, I tried to pin once before and dropped my PR due to that. +1 for @HyukjinKwon 's advice, I would only pin the version when an issue difficult to fix is found..

The only exception is spark-rm/Dockerfile. We need to use specific version and to manager explicitly there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon - When looking at project dependencies, there is an important distinction between projects that are used as libraries and projects that are used as stand-alone applications.

If your project is a library, then you know others are importing you alongside other dependencies too. To minimize the chance of transitive dependency conflicts, you want to be flexible in how you specify your dependencies.

When your project is a stand-alone application, you don't have to worry about such things. You can pin every dependency to a specific version to get the most predictable and reliable build and runtime behavior.

In our case, the Spark build environment is more akin to a stand-alone application than a library. We don't need to worry about downstream users struggling with dependency conflicts. We can get the most stable build behavior by pinning everything, and there is no downside as far as I can tell.

I'll use Trio as an example again to illustrate my point:

  • Trio is a library that others will typically import alongside many other dependencies. So in Trio's setup.py they are very flexible in how they specify their dependencies.
  • Trio's test environment, on the other hand, is only used by Trio contributors. So Trio locks down every test requirement using pip-tools.

Copy link
Member

@HyukjinKwon HyukjinKwon Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the trade-off here as well - it's the most stable but there might be multiple bugs existing fixed in the upstream. It will still mess developer's environment. Arguably there are not so many Python-dedicated dev people in Spark community who fluently use pyenv, virtualenv or conda. I think most of them just use pip and local installation.

I quickly skimmed requirements.txt for dev or docs at here. I skimmed top 20 projects, and found 6 instances.

  • 3 of them were not quite pinned.
  • 2 of them were partially pinned
  • 1 of them was completely pinned.

I agree the dev envs more tend to specify the versions; however, I think it's still prevailing to don't pin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help then if we expanded dev/README.md to show how to setup a virtual environment? I'm willing to do that.

If we don't want to ask devs to use virtual environments at all, then perhaps we need to fork dev/requirements.txt and have a version that pins everything, for use in CI and releases, and a version that pins nothing, for use by devs who don't use virtual environments.

Another alternative is the compromise currently standing in this PR, with some versions specified as major.minor.*.

And yet another alternative (which I personally wouldn't favor, but I know it's common) is to Dockerize the whole development environment, but that's a lot of work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a note in README to ensure people know how they can isolate this env if needed, and pinning minor version only? is that close enough for consensus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll do that. I'll recommend that users create a virtual environment in dev/README.md and demonstrate how to do that. I'll also update the version specifiers to all be in the form of major.minor.*.

Copy link
Contributor Author

@nchammas nchammas Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on second thought, perhaps the README should be left to a separate PR. We already have advice on setting up a development environment in at least a couple of places, like Useful Developer Tools and docs/README.md.

Perhaps we should consolidate that advice over on the Useful Developer Tools page, since it fits in with the information already there.

Either that, or let's agree on some other approach to take. But I think we can defer any dev documentation changes to a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also would prefer not to pin specific versions and agree with #27928 (comment). It is good to have the community try with the latest to surface any issues, but we should be very clear what versions have been used in our CI, which could be from a virtualenv or just in a readme, so there is always an obvious fallback version.

@@ -1,5 +1,10 @@
flake8==3.5.0
flake8==3.7.*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I still think we should pin every version here, perhaps this approach is a compromise we can agree on.

==3.7.* means pip will install the latest bugfix release on 3.7. If you already have any 3.7 version installed, even if it's not the latest one, pip will consider the requirement satisfied and won't do anything. To force pip to upgrade to the latest bugfix release of 3.7 when you already have a compatible version installed, simply request it via pip install --upgrade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can assume these dependencies follow SemVer, it we should better use wildcards on minor versions ...

Copy link
Contributor Author

@nchammas nchammas Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it elsewhere but I'll mention it again here: Linters like flake8 and pycodestyle introduce new checks in minor/feature releases. There is very high chance that every new check they introduce will flag new problems and fail the build.

In fact, we saw exactly that behavior with pydocstyle just before we removed it. And I experienced this with pycodestyle in Flintrock before pinning the version.

I don't understand the point of waiting for the build to break before pinning or severely limiting the versions for libraries like these.

@srowen
Copy link
Member

srowen commented Mar 17, 2020

I'm still pretty OK with pinning, but, sure, allow taking later maintenance releases?

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119934 has finished for PR 27928 at commit cd05e25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkinmensch, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119950 has finished for PR 27928 at commit cd05e25.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 20, 2020

Test build #120116 has finished for PR 27928 at commit 75682a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 23, 2020

@srowen, With this change, we will have to maintain and keep dev/requirements.txt up to date. We will have to fix that file to use the latest version from now on. Why don't we try to don't pin first and see if we face related issues by using the latest versions? We can start to pin one by one when we face some issues difficult to fix.

For the dependencies below, we have been testing via Github Actions. So far, I couldn't find any related issues related to the versions.

  • pycodestyle and flake8 are Python linters.
  • mkdocs is for SQL documentation
  • sphinx is for Python documentation.

We shouldn't pin numpy to encourage people to test the highest versions. It should ideally be numpy>=1.7 according to setup.py.

  • numpy is an explicit dependency for ML/MLlib in PySpark.

The dependencies below are release-specific and tricky to test. I suspect it's better for new dev people to test them out and pin the versions later when it's needed?

  • PyGithub and Unidecode are release-specific dependencies.
  • jira is for release, JIRA <> PR sync, and PR merge. (Did we test BTW?)

FYI, there look some more occurrences such as pandas and numpy in PIP sanity check at dev/run-pip-tests as well from my cursory look:

conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
pip install --upgrade pip wheel numpy

@nchammas
Copy link
Contributor Author

With this change, we will have to maintain and keep dev/requirements.txt up to date.

Maybe this is the disconnect between our points of view, because so far I haven't really been following your objections to pinning. Assuming we pin every library, why do we have to keep dev/requirements.txt up-to-date?

As long as we can build the docs, run tests, and do whatever else we need to do as part of regular development, that file can remain frozen as-is for years.

It's only when we specifically want to use some new feature of, say, Sphinx, that we need to bump versions. But that will happen very rarely, I imagine not more than once every couple of years.

Does that address your concern? Why do you think we'd need to touch that file more often than once in a long while?

We shouldn't pin numpy to encourage people to test the highest versions. It should ideally be numpy>=1.7 according to setup.py.

  • numpy is an explicit dependency for ML/MLlib in PySpark.

But the specification of numpy in dev/requirements.txt is so that we can build our docs. (It seems strange, but yes, numpy is a requirement to build our Python API docs.)

Maybe we can improve this by replacing numpy in dev/requirements.txt with a reference to setup.py. That way we can track PySpark dependencies (whether for building the docs or for general execution) in one place. This will also pick up the Pandas requirement. How does that sound?

A separate issue I raised earlier is that, if we want to not pin our build/test dependencies, we need to figure out what to do about the Spark Docker image and CI. Either those will also source the unpinned requirements from the same file, or we go back to having the requirements specified in duplicate--with pinned versions for Docker and CI, and without pinned versions for developers.

Obviously, I'd prefer to pin everything and keep it in one place, but if you want to go one of these routes I guess I'll do that. I just want to understand and try to address your objections before going there.

@srowen
Copy link
Member

srowen commented Mar 29, 2020

@BryanCutler @HyukjinKwon I'm not sure where the consensus lands here. I think part of your argument is you do want to take some breakage that comes with automatic version updates. That seems a bit less valuable than relative stability, to me. I suppose I end up neutral and we have some slight preferences for and against then? Is there any change here with broader support?

@HyukjinKwon
Copy link
Member

Yes. I want to take some breakages that comes with automatic version updates, also given that we haven't faced many issues by doing that. Yes, seems like there's a bit of different preferences.

Can we touch versions here and just keep the refactoring alone in this PR for now? I like that part of putting dependencies into requirements.txt.

@nchammas
Copy link
Contributor Author

nchammas commented Apr 9, 2020

What does everyone think of the current approach taken in this PR?

Are we good to go now (pending a committer test of the release script to avoid a repeat of #27534)?

@nchammas
Copy link
Contributor Author

nchammas commented Apr 16, 2020

Checking in again here. Is there something I can do to move this along?

@srowen
Copy link
Member

srowen commented Apr 16, 2020

I'm still generally OK with pinning things "lightly"; I am not sure how to test it though as I'm also not an RM.

@nchammas
Copy link
Contributor Author

I manually copied the pinned requirements to the required location and kicked off the Docker build as follows (basically, trying to mimic parts of do-release-docker.sh):

cp .../requirements-pinned.txt spark-rm/dev-requirements-pinned.txt
docker build --build-arg UID=$UID ./spark-rm/

It ran all the way through successfully on my laptop.

@HyukjinKwon and @cloud-fan - Are you still interested in this PR? Would either of you be able to test do-release-docker.sh?

If you'd prefer, I can back out any changes to the release machinery, just to be conservative. It's not ideal, but if that would help get this PR over the line, then I'll do it.

@SparkQA
Copy link

SparkQA commented Apr 17, 2020

Test build #121414 has finished for PR 27928 at commit 9704c60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class JavaUserDefinedScalar
  • public class SparkAvroKeyOutputFormat extends AvroKeyOutputFormat<GenericRecord>
  • static class SparkRecordWriterFactory extends RecordWriterFactory<GenericRecord>
  • class FMClassifierWrapperWriter(instance: FMClassifierWrapper) extends MLWriter
  • class FMClassifierWrapperReader extends MLReader[FMClassifierWrapper]
  • class FMRegressorWrapperWriter(instance: FMRegressorWrapper) extends MLWriter
  • class FMRegressorWrapperReader extends MLReader[FMRegressorWrapper]
  • class LinearRegressionWrapperWriter(instance: LinearRegressionWrapper) extends MLWriter
  • class LinearRegressionWrapperReader extends MLReader[LinearRegressionWrapper]
  • case class LengthOfJsonArray(child: Expression) extends UnaryExpression
  • case class JsonObjectKeys(child: Expression) extends UnaryExpression with CodegenFallback
  • case class ShowViews(
  • case class ShowViewsCommand(
  • class JDBCConfiguration(

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122250 has finished for PR 27928 at commit 9704c60.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class JavaUserDefinedScalar
  • public class SparkAvroKeyOutputFormat extends AvroKeyOutputFormat<GenericRecord>
  • static class SparkRecordWriterFactory extends RecordWriterFactory<GenericRecord>
  • class FMClassifierWrapperWriter(instance: FMClassifierWrapper) extends MLWriter
  • class FMClassifierWrapperReader extends MLReader[FMClassifierWrapper]
  • class FMRegressorWrapperWriter(instance: FMRegressorWrapper) extends MLWriter
  • class FMRegressorWrapperReader extends MLReader[FMRegressorWrapper]
  • class LinearRegressionWrapperWriter(instance: LinearRegressionWrapper) extends MLWriter
  • class LinearRegressionWrapperReader extends MLReader[LinearRegressionWrapper]
  • case class LengthOfJsonArray(child: Expression) extends UnaryExpression
  • case class JsonObjectKeys(child: Expression) extends UnaryExpression with CodegenFallback
  • case class ShowViews(
  • case class ShowViewsCommand(
  • class JDBCConfiguration(

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122267 has finished for PR 27928 at commit 7df0040.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AuthRpcHandler extends AbstractAuthRpcHandler
  • public class SaslRpcHandler extends AbstractAuthRpcHandler
  • public abstract class AbstractAuthRpcHandler extends RpcHandler
  • class AbortableRpcFuture[T: ClassTag](val future: Future[T], onAbort: Throwable => Unit)
  • public class JavaFValueTestExample
  • class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType, rebaseDateTime: Boolean)
  • class AvroSerializer(
  • class ExecutorResourceRequest(object):
  • class ExecutorResourceRequests(object):
  • class ResourceProfile(object):
  • class ResourceProfileBuilder(object):
  • class TaskResourceRequest(object):
  • class TaskResourceRequests(object):
  • case class AggregateWithHaving(
  • abstract class CurrentTimestampLike() extends LeafExpression with CodegenFallback
  • case class CurrentTimestamp() extends CurrentTimestampLike
  • case class Now() extends CurrentTimestampLike
  • case class YearOfWeek(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
  • case class DatetimeSub(
  • case class DateAddInterval(
  • case class Extract(field: Expression, source: Expression, child: Expression)
  • case class Rand(child: Expression, hideSeed: Boolean = false)
  • case class Randn(child: Expression, hideSeed: Boolean = false)
  • class CacheManager extends Logging with AdaptiveSparkPlanHelper
  • case class AdaptiveExecutionContext(session: SparkSession, qe: QueryExecution)
  • class ParquetReadSupport(

@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123391 has finished for PR 27928 at commit bae2c0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

nchammas commented Jul 1, 2020

Now that 3.0 has been released, do we want to revisit this? It's ready to go, as far as I'm concerned, save for a little testing by a maintainer just to be safe.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124788 has finished for PR 27928 at commit e21d97d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class MapOutputCommitMessage
  • class ExecutorResourceRequest(
  • class ExecutorResourceRequests() extends Serializable
  • class ResourceProfileBuilder()
  • class TaskResourceRequest(val resourceName: String, val amount: Double)
  • class TaskResourceRequests() extends Serializable
  • sealed trait LogisticRegressionSummary extends ClassificationSummary
  • sealed trait RandomForestClassificationSummary extends ClassificationSummary
  • class _ClassificationSummary(JavaWrapper):
  • class _TrainingSummary(JavaWrapper):
  • class _BinaryClassificationSummary(_ClassificationSummary):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable,
  • class LinearSVCSummary(_BinaryClassificationSummary):
  • class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):
  • class LogisticRegressionSummary(_ClassificationSummary):
  • class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):
  • class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,
  • class RandomForestClassificationSummary(_ClassificationSummary):
  • class RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,
  • class BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):
  • class BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,
  • trait TimestampFormatterHelper extends TimeZoneAwareExpression
  • case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
  • case class DayOfYear(child: Expression) extends GetDateField
  • case class Year(child: Expression) extends GetDateField
  • case class YearOfWeek(child: Expression) extends GetDateField
  • case class Quarter(child: Expression) extends GetDateField
  • case class Month(child: Expression) extends GetDateField
  • case class DayOfMonth(child: Expression) extends GetDateField
  • case class DayOfWeek(child: Expression) extends GetDateField
  • case class WeekDay(child: Expression) extends GetDateField
  • case class WeekOfYear(child: Expression) extends GetDateField
  • case class CheckOverflowInSum(
  • case class WidthBucket(
  • trait PredicateHelper extends Logging
  • sealed abstract class MergeAction extends Expression with Unevaluable
  • case class DeleteAction(condition: Option[Expression]) extends MergeAction
  • case class TimeFormatters(date: DateFormatter, timestamp: TimestampFormatter)
  • case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan]
  • case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger
  • case class ContinuousTrigger(intervalMs: Long) extends Trigger
  • class StateStoreConf(

@nchammas
Copy link
Contributor Author

nchammas commented Jul 1, 2020

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124818 has finished for PR 27928 at commit e21d97d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class MapOutputCommitMessage
  • class ExecutorResourceRequest(
  • class ExecutorResourceRequests() extends Serializable
  • class ResourceProfileBuilder()
  • class TaskResourceRequest(val resourceName: String, val amount: Double)
  • class TaskResourceRequests() extends Serializable
  • sealed trait LogisticRegressionSummary extends ClassificationSummary
  • sealed trait RandomForestClassificationSummary extends ClassificationSummary
  • class _ClassificationSummary(JavaWrapper):
  • class _TrainingSummary(JavaWrapper):
  • class _BinaryClassificationSummary(_ClassificationSummary):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable,
  • class LinearSVCSummary(_BinaryClassificationSummary):
  • class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):
  • class LogisticRegressionSummary(_ClassificationSummary):
  • class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):
  • class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,
  • class RandomForestClassificationSummary(_ClassificationSummary):
  • class RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,
  • class BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):
  • class BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,
  • trait TimestampFormatterHelper extends TimeZoneAwareExpression
  • case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
  • case class DayOfYear(child: Expression) extends GetDateField
  • case class Year(child: Expression) extends GetDateField
  • case class YearOfWeek(child: Expression) extends GetDateField
  • case class Quarter(child: Expression) extends GetDateField
  • case class Month(child: Expression) extends GetDateField
  • case class DayOfMonth(child: Expression) extends GetDateField
  • case class DayOfWeek(child: Expression) extends GetDateField
  • case class WeekDay(child: Expression) extends GetDateField
  • case class WeekOfYear(child: Expression) extends GetDateField
  • case class CheckOverflowInSum(
  • case class WidthBucket(
  • trait PredicateHelper extends Logging
  • sealed abstract class MergeAction extends Expression with Unevaluable
  • case class DeleteAction(condition: Option[Expression]) extends MergeAction
  • case class TimeFormatters(date: DateFormatter, timestamp: TimestampFormatter)
  • case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan]
  • case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger
  • case class ContinuousTrigger(intervalMs: Long) extends Trigger
  • class StateStoreConf(

@nchammas
Copy link
Contributor Author

nchammas commented Jul 2, 2020

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124917 has finished for PR 27928 at commit e21d97d.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class MapOutputCommitMessage
  • class ExecutorResourceRequest(
  • class ExecutorResourceRequests() extends Serializable
  • class ResourceProfileBuilder()
  • class TaskResourceRequest(val resourceName: String, val amount: Double)
  • class TaskResourceRequests() extends Serializable
  • sealed trait LogisticRegressionSummary extends ClassificationSummary
  • sealed trait RandomForestClassificationSummary extends ClassificationSummary
  • class _ClassificationSummary(JavaWrapper):
  • class _TrainingSummary(JavaWrapper):
  • class _BinaryClassificationSummary(_ClassificationSummary):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable,
  • class LinearSVCSummary(_BinaryClassificationSummary):
  • class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):
  • class LogisticRegressionSummary(_ClassificationSummary):
  • class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):
  • class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,
  • class RandomForestClassificationSummary(_ClassificationSummary):
  • class RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,
  • class BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):
  • class BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,
  • trait TimestampFormatterHelper extends TimeZoneAwareExpression
  • case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
  • case class DayOfYear(child: Expression) extends GetDateField
  • case class Year(child: Expression) extends GetDateField
  • case class YearOfWeek(child: Expression) extends GetDateField
  • case class Quarter(child: Expression) extends GetDateField
  • case class Month(child: Expression) extends GetDateField
  • case class DayOfMonth(child: Expression) extends GetDateField
  • case class DayOfWeek(child: Expression) extends GetDateField
  • case class WeekDay(child: Expression) extends GetDateField
  • case class WeekOfYear(child: Expression) extends GetDateField
  • case class CheckOverflowInSum(
  • case class WidthBucket(
  • trait PredicateHelper extends Logging
  • sealed abstract class MergeAction extends Expression with Unevaluable
  • case class DeleteAction(condition: Option[Expression]) extends MergeAction
  • case class TimeFormatters(date: DateFormatter, timestamp: TimestampFormatter)
  • case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan]
  • case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger
  • case class ContinuousTrigger(intervalMs: Long) extends Trigger
  • class StateStoreConf(

@nchammas
Copy link
Contributor Author

nchammas commented Jul 2, 2020

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on 
project spark-parent_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: 
Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml: 
in epilog non whitespace content is not allowed but got t (position: END_TAG seen ...</metadata>\nt... @26:2) -> [Help 1]

Seems unrelated to this PR.

Retest this please.

@nchammas
Copy link
Contributor Author

nchammas commented Jul 9, 2020

I think this PR is failing to garner enough interest. If there's nothing I can do at this time to make it more appealing, I'll close it. I think the idea is a good one, but it needs buy-in from a release manager.

@srowen
Copy link
Member

srowen commented Jul 9, 2020

Hm, I'm mostly in favor of it. Hyukjin if you're somewhat negative on it, OK let's shelve this. If you were fairly neutral, I think we can do this.

@nchammas
Copy link
Contributor Author

Happy to revisit this when there is more support from a release manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants