[SPARK-31167][BUILD] Refactor how we track Python test/build dependencies #27928

nchammas · 2020-03-16T17:38:13Z

What changes were proposed in this pull request?

This PR (SPARK-31167) refactors how we track various Python test and build dependencies so they are pinned and internally consistent.

Why are the changes needed?

This should make it easier to bump dependencies (since they are specified in fewer places) and promote more consistent build behavior across Docker, Jenkins, GitHub, and local development environments.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I built the Docker image just far enough to confirm that the dependencies are installed correctly:

Step 16/31 : RUN pip install -r docs-requirements.txt
 ---> Running in b0c66cb2a543
Collecting sphinx==2.3.1
  Downloading Sphinx-2.3.1-py3-none-any.whl (2.7 MB)
Collecting mkdocs==1.0.4
  Downloading mkdocs-1.0.4-py2.py3-none-any.whl (1.2 MB)
Collecting numpy==1.18.1
  Downloading numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
...

The GitHub workflow will be tested as part of this PR.

docs/requirements.txt

nchammas · 2020-03-16T17:52:31Z

cc @HyukjinKwon. I moved the requirements changes from #27912 to this PR.

SparkQA · 2020-03-16T20:13:34Z

Test build #119884 has finished for PR 27928 at commit 9c3a7af.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

nchammas · 2020-03-16T22:43:37Z

Jenkins, retest this please.

SparkQA · 2020-03-17T01:15:44Z

Test build #119896 has finished for PR 27928 at commit 9c3a7af.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-03-17T03:08:04Z

dev/requirements.txt

 Unidecode==0.04.19
-sphinx
+sphinx==2.3.1
+numpy==1.18.1


I quickly googled and skimmed other projects about the requirements here and here. I still think it's more usual to specify the range, rather than the specific version.

I am still not sure if it's right to pin the version yet. There's trade-off on pinning. I think Spark community is big enough to handle these issue from using the latest versions too. I would only pin the version when an issue difficult to fix is found.

cc somewhat arbitrary committers who might be interested in here: @srowen, @holdenk, @dongjoon-hyun, @BryanCutler. WDYT about this?

I understand the trade-off mentioned by @HyukjinKwon . Actually, I tried to pin once before and dropped my PR due to that. +1 for @HyukjinKwon 's advice, I would only pin the version when an issue difficult to fix is found..

The only exception is spark-rm/Dockerfile. We need to use specific version and to manager explicitly there.

@HyukjinKwon - When looking at project dependencies, there is an important distinction between projects that are used as libraries and projects that are used as stand-alone applications.

If your project is a library, then you know others are importing you alongside other dependencies too. To minimize the chance of transitive dependency conflicts, you want to be flexible in how you specify your dependencies.

When your project is a stand-alone application, you don't have to worry about such things. You can pin every dependency to a specific version to get the most predictable and reliable build and runtime behavior.

In our case, the Spark build environment is more akin to a stand-alone application than a library. We don't need to worry about downstream users struggling with dependency conflicts. We can get the most stable build behavior by pinning everything, and there is no downside as far as I can tell.

I'll use Trio as an example again to illustrate my point:

Trio is a library that others will typically import alongside many other dependencies. So in Trio's setup.py they are very flexible in how they specify their dependencies.

Trio's test environment, on the other hand, is only used by Trio contributors. So Trio locks down every test requirement using pip-tools.

There's the trade-off here as well - it's the most stable but there might be multiple bugs existing fixed in the upstream. It will still mess developer's environment. Arguably there are not so many Python-dedicated dev people in Spark community who fluently use pyenv, virtualenv or conda. I think most of them just use pip and local installation.

I quickly skimmed requirements.txt for dev or docs at here. I skimmed top 20 projects, and found 6 instances.

3 of them were not quite pinned.

2 of them were partially pinned

1 of them was completely pinned.

I agree the dev envs more tend to specify the versions; however, I think it's still prevailing to don't pin.

Would it help then if we expanded dev/README.md to show how to setup a virtual environment? I'm willing to do that.

If we don't want to ask devs to use virtual environments at all, then perhaps we need to fork dev/requirements.txt and have a version that pins everything, for use in CI and releases, and a version that pins nothing, for use by devs who don't use virtual environments.

Another alternative is the compromise currently standing in this PR, with some versions specified as major.minor.*.

And yet another alternative (which I personally wouldn't favor, but I know it's common) is to Dockerize the whole development environment, but that's a lot of work.

I would suggest a note in README to ensure people know how they can isolate this env if needed, and pinning minor version only? is that close enough for consensus?

OK, I'll do that. I'll recommend that users create a virtual environment in dev/README.md and demonstrate how to do that. I'll also update the version specifiers to all be in the form of major.minor.*.

Hmm, on second thought, perhaps the README should be left to a separate PR. We already have advice on setting up a development environment in at least a couple of places, like Useful Developer Tools and docs/README.md.

Perhaps we should consolidate that advice over on the Useful Developer Tools page, since it fits in with the information already there.

Either that, or let's agree on some other approach to take. But I think we can defer any dev documentation changes to a follow-up PR.

I also would prefer not to pin specific versions and agree with #27928 (comment). It is good to have the community try with the latest to surface any issues, but we should be very clear what versions have been used in our CI, which could be from a virtualenv or just in a readme, so there is always an obvious fallback version.

…1167-missing-test-deps

nchammas · 2020-03-17T14:34:58Z

dev/requirements.txt

@@ -1,5 +1,10 @@
-flake8==3.5.0
+flake8==3.7.*


While I still think we should pin every version here, perhaps this approach is a compromise we can agree on.

==3.7.* means pip will install the latest bugfix release on 3.7. If you already have any 3.7 version installed, even if it's not the latest one, pip will consider the requirement satisfied and won't do anything. To force pip to upgrade to the latest bugfix release of 3.7 when you already have a compatible version installed, simply request it via pip install --upgrade.

If we can assume these dependencies follow SemVer, it we should better use wildcards on minor versions ...

I mentioned it elsewhere but I'll mention it again here: Linters like flake8 and pycodestyle introduce new checks in minor/feature releases. There is very high chance that every new check they introduce will flag new problems and fail the build.

In fact, we saw exactly that behavior with pydocstyle just before we removed it. And I experienced this with pycodestyle in Flintrock before pinning the version.

I don't understand the point of waiting for the build to break before pinning or severely limiting the versions for libraries like these.

srowen · 2020-03-17T14:50:13Z

I'm still pretty OK with pinning, but, sure, allow taking later maintenance releases?

SparkQA · 2020-03-17T16:52:52Z

Test build #119934 has finished for PR 27928 at commit cd05e25.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

nchammas · 2020-03-17T19:32:23Z

Jenkinmensch, retest this please.

SparkQA · 2020-03-17T23:23:35Z

Test build #119950 has finished for PR 27928 at commit cd05e25.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-20T22:33:28Z

Test build #120116 has finished for PR 27928 at commit 75682a6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Seems reasonable to me?

HyukjinKwon · 2020-03-23T01:15:33Z

@srowen, With this change, we will have to maintain and keep dev/requirements.txt up to date. We will have to fix that file to use the latest version from now on. Why don't we try to don't pin first and see if we face related issues by using the latest versions? We can start to pin one by one when we face some issues difficult to fix.

For the dependencies below, we have been testing via Github Actions. So far, I couldn't find any related issues related to the versions.

pycodestyle and flake8 are Python linters.
mkdocs is for SQL documentation
sphinx is for Python documentation.

We shouldn't pin numpy to encourage people to test the highest versions. It should ideally be numpy>=1.7 according to setup.py.

numpy is an explicit dependency for ML/MLlib in PySpark.

The dependencies below are release-specific and tricky to test. I suspect it's better for new dev people to test them out and pin the versions later when it's needed?

PyGithub and Unidecode are release-specific dependencies.
jira is for release, JIRA <> PR sync, and PR merge. (Did we test BTW?)

FYI, there look some more occurrences such as pandas and numpy in PIP sanity check at dev/run-pip-tests as well from my cursory look:

conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools

pip install --upgrade pip wheel numpy

nchammas · 2020-03-23T04:13:51Z

With this change, we will have to maintain and keep dev/requirements.txt up to date.

Maybe this is the disconnect between our points of view, because so far I haven't really been following your objections to pinning. Assuming we pin every library, why do we have to keep dev/requirements.txt up-to-date?

As long as we can build the docs, run tests, and do whatever else we need to do as part of regular development, that file can remain frozen as-is for years.

It's only when we specifically want to use some new feature of, say, Sphinx, that we need to bump versions. But that will happen very rarely, I imagine not more than once every couple of years.

Does that address your concern? Why do you think we'd need to touch that file more often than once in a long while?

We shouldn't pin numpy to encourage people to test the highest versions. It should ideally be numpy>=1.7 according to setup.py.

numpy is an explicit dependency for ML/MLlib in PySpark.

But the specification of numpy in dev/requirements.txt is so that we can build our docs. (It seems strange, but yes, numpy is a requirement to build our Python API docs.)

Maybe we can improve this by replacing numpy in dev/requirements.txt with a reference to setup.py. That way we can track PySpark dependencies (whether for building the docs or for general execution) in one place. This will also pick up the Pandas requirement. How does that sound?

A separate issue I raised earlier is that, if we want to not pin our build/test dependencies, we need to figure out what to do about the Spark Docker image and CI. Either those will also source the unpinned requirements from the same file, or we go back to having the requirements specified in duplicate--with pinned versions for Docker and CI, and without pinned versions for developers.

Obviously, I'd prefer to pin everything and keep it in one place, but if you want to go one of these routes I guess I'll do that. I just want to understand and try to address your objections before going there.

srowen · 2020-03-29T16:12:01Z

@BryanCutler @HyukjinKwon I'm not sure where the consensus lands here. I think part of your argument is you do want to take some breakage that comes with automatic version updates. That seems a bit less valuable than relative stability, to me. I suppose I end up neutral and we have some slight preferences for and against then? Is there any change here with broader support?

HyukjinKwon · 2020-03-30T06:56:20Z

Yes. I want to take some breakages that comes with automatic version updates, also given that we haven't faced many issues by doing that. Yes, seems like there's a bit of different preferences.

Can we touch versions here and just keep the refactoring alone in this PR for now? I like that part of putting dependencies into requirements.txt.

docs/README.md

nchammas · 2020-04-09T15:04:30Z

What does everyone think of the current approach taken in this PR?

Are we good to go now (pending a committer test of the release script to avoid a repeat of #27534)?

nchammas · 2020-04-16T16:14:08Z

Checking in again here. Is there something I can do to move this along?

srowen · 2020-04-16T16:35:06Z

I'm still generally OK with pinning things "lightly"; I am not sure how to test it though as I'm also not an RM.

…1167-missing-test-deps

nchammas · 2020-04-17T15:20:46Z

I manually copied the pinned requirements to the required location and kicked off the Docker build as follows (basically, trying to mimic parts of do-release-docker.sh):

cp .../requirements-pinned.txt spark-rm/dev-requirements-pinned.txt
docker build --build-arg UID=$UID ./spark-rm/

It ran all the way through successfully on my laptop.

@HyukjinKwon and @cloud-fan - Are you still interested in this PR? Would either of you be able to test do-release-docker.sh?

If you'd prefer, I can back out any changes to the release machinery, just to be conservative. It's not ideal, but if that would help get this PR over the line, then I'll do it.

SparkQA · 2020-04-17T17:46:57Z

Test build #121414 has finished for PR 27928 at commit 9704c60.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class JavaUserDefinedScalar
public class SparkAvroKeyOutputFormat extends AvroKeyOutputFormat<GenericRecord>
static class SparkRecordWriterFactory extends RecordWriterFactory<GenericRecord>
class FMClassifierWrapperWriter(instance: FMClassifierWrapper) extends MLWriter
class FMClassifierWrapperReader extends MLReader[FMClassifierWrapper]
class FMRegressorWrapperWriter(instance: FMRegressorWrapper) extends MLWriter
class FMRegressorWrapperReader extends MLReader[FMRegressorWrapper]
class LinearRegressionWrapperWriter(instance: LinearRegressionWrapper) extends MLWriter
class LinearRegressionWrapperReader extends MLReader[LinearRegressionWrapper]
case class LengthOfJsonArray(child: Expression) extends UnaryExpression
case class JsonObjectKeys(child: Expression) extends UnaryExpression with CodegenFallback
case class ShowViews(
case class ShowViewsCommand(
class JDBCConfiguration(

SparkQA · 2020-05-04T07:05:02Z

Test build #122250 has finished for PR 27928 at commit 9704c60.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class JavaUserDefinedScalar
public class SparkAvroKeyOutputFormat extends AvroKeyOutputFormat<GenericRecord>
static class SparkRecordWriterFactory extends RecordWriterFactory<GenericRecord>
class FMClassifierWrapperWriter(instance: FMClassifierWrapper) extends MLWriter
class FMClassifierWrapperReader extends MLReader[FMClassifierWrapper]
class FMRegressorWrapperWriter(instance: FMRegressorWrapper) extends MLWriter
class FMRegressorWrapperReader extends MLReader[FMRegressorWrapper]
class LinearRegressionWrapperWriter(instance: LinearRegressionWrapper) extends MLWriter
class LinearRegressionWrapperReader extends MLReader[LinearRegressionWrapper]
case class LengthOfJsonArray(child: Expression) extends UnaryExpression
case class JsonObjectKeys(child: Expression) extends UnaryExpression with CodegenFallback
case class ShowViews(
case class ShowViewsCommand(
class JDBCConfiguration(

…1167-missing-test-deps

dev/README.md

SparkQA · 2020-05-04T18:15:38Z

Test build #122267 has finished for PR 27928 at commit 7df0040.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AuthRpcHandler extends AbstractAuthRpcHandler
public class SaslRpcHandler extends AbstractAuthRpcHandler
public abstract class AbstractAuthRpcHandler extends RpcHandler
class AbortableRpcFuture[T: ClassTag](val future: Future[T], onAbort: Throwable => Unit)
public class JavaFValueTestExample
class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType, rebaseDateTime: Boolean)
class AvroSerializer(
class ExecutorResourceRequest(object):
class ExecutorResourceRequests(object):
class ResourceProfile(object):
class ResourceProfileBuilder(object):
class TaskResourceRequest(object):
class TaskResourceRequests(object):
case class AggregateWithHaving(
abstract class CurrentTimestampLike() extends LeafExpression with CodegenFallback
case class CurrentTimestamp() extends CurrentTimestampLike
case class Now() extends CurrentTimestampLike
case class YearOfWeek(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
case class DatetimeSub(
case class DateAddInterval(
case class Extract(field: Expression, source: Expression, child: Expression)
case class Rand(child: Expression, hideSeed: Boolean = false)
case class Randn(child: Expression, hideSeed: Boolean = false)
class CacheManager extends Logging with AdaptiveSparkPlanHelper
case class AdaptiveExecutionContext(session: SparkSession, qe: QueryExecution)
class ParquetReadSupport(

…1167-missing-test-deps

SparkQA · 2020-06-01T22:20:28Z

Test build #123391 has finished for PR 27928 at commit bae2c0c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

nchammas · 2020-07-01T13:57:46Z

Now that 3.0 has been released, do we want to revisit this? It's ready to go, as far as I'm concerned, save for a little testing by a maintainer just to be safe.

…1167-missing-test-deps

SparkQA · 2020-07-01T18:11:23Z

Test build #124788 has finished for PR 27928 at commit e21d97d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public final class MapOutputCommitMessage
class ExecutorResourceRequest(
class ExecutorResourceRequests() extends Serializable
class ResourceProfileBuilder()
class TaskResourceRequest(val resourceName: String, val amount: Double)
class TaskResourceRequests() extends Serializable
sealed trait LogisticRegressionSummary extends ClassificationSummary
sealed trait RandomForestClassificationSummary extends ClassificationSummary
class _ClassificationSummary(JavaWrapper):
class _TrainingSummary(JavaWrapper):
class _BinaryClassificationSummary(_ClassificationSummary):
class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable,
class LinearSVCSummary(_BinaryClassificationSummary):
class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):
class LogisticRegressionSummary(_ClassificationSummary):
class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):
class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,
class RandomForestClassificationSummary(_ClassificationSummary):
class RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,
class BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):
class BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,
trait TimestampFormatterHelper extends TimeZoneAwareExpression
case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
case class DayOfYear(child: Expression) extends GetDateField
case class Year(child: Expression) extends GetDateField
case class YearOfWeek(child: Expression) extends GetDateField
case class Quarter(child: Expression) extends GetDateField
case class Month(child: Expression) extends GetDateField
case class DayOfMonth(child: Expression) extends GetDateField
case class DayOfWeek(child: Expression) extends GetDateField
case class WeekDay(child: Expression) extends GetDateField
case class WeekOfYear(child: Expression) extends GetDateField
case class CheckOverflowInSum(
case class WidthBucket(
trait PredicateHelper extends Logging
sealed abstract class MergeAction extends Expression with Unevaluable
case class DeleteAction(condition: Option[Expression]) extends MergeAction
case class TimeFormatters(date: DateFormatter, timestamp: TimestampFormatter)
case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan]
case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger
case class ContinuousTrigger(intervalMs: Long) extends Trigger
class StateStoreConf(

nchammas · 2020-07-01T21:15:53Z

Retest this please.

SparkQA · 2020-07-02T00:57:41Z

Test build #124818 has finished for PR 27928 at commit e21d97d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public final class MapOutputCommitMessage
class ExecutorResourceRequest(
class ExecutorResourceRequests() extends Serializable
class ResourceProfileBuilder()
class TaskResourceRequest(val resourceName: String, val amount: Double)
class TaskResourceRequests() extends Serializable
sealed trait LogisticRegressionSummary extends ClassificationSummary
sealed trait RandomForestClassificationSummary extends ClassificationSummary
class _ClassificationSummary(JavaWrapper):
class _TrainingSummary(JavaWrapper):
class _BinaryClassificationSummary(_ClassificationSummary):
class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable,
class LinearSVCSummary(_BinaryClassificationSummary):
class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):
class LogisticRegressionSummary(_ClassificationSummary):
class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):
class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,
class RandomForestClassificationSummary(_ClassificationSummary):
class RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,
class BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):
class BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,
trait TimestampFormatterHelper extends TimeZoneAwareExpression
case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
case class DayOfYear(child: Expression) extends GetDateField
case class Year(child: Expression) extends GetDateField
case class YearOfWeek(child: Expression) extends GetDateField
case class Quarter(child: Expression) extends GetDateField
case class Month(child: Expression) extends GetDateField
case class DayOfMonth(child: Expression) extends GetDateField
case class DayOfWeek(child: Expression) extends GetDateField
case class WeekDay(child: Expression) extends GetDateField
case class WeekOfYear(child: Expression) extends GetDateField
case class CheckOverflowInSum(
case class WidthBucket(
trait PredicateHelper extends Logging
sealed abstract class MergeAction extends Expression with Unevaluable
case class DeleteAction(condition: Option[Expression]) extends MergeAction
case class TimeFormatters(date: DateFormatter, timestamp: TimestampFormatter)
case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan]
case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger
case class ContinuousTrigger(intervalMs: Long) extends Trigger
class StateStoreConf(

nchammas · 2020-07-02T15:46:05Z

Retest this please.

SparkQA · 2020-07-02T15:51:19Z

Test build #124917 has finished for PR 27928 at commit e21d97d.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public final class MapOutputCommitMessage
class ExecutorResourceRequest(
class ExecutorResourceRequests() extends Serializable
class ResourceProfileBuilder()
class TaskResourceRequest(val resourceName: String, val amount: Double)
class TaskResourceRequests() extends Serializable
sealed trait LogisticRegressionSummary extends ClassificationSummary
sealed trait RandomForestClassificationSummary extends ClassificationSummary
class _ClassificationSummary(JavaWrapper):
class _TrainingSummary(JavaWrapper):
class _BinaryClassificationSummary(_ClassificationSummary):
class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable,
class LinearSVCSummary(_BinaryClassificationSummary):
class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):
class LogisticRegressionSummary(_ClassificationSummary):
class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):
class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,
class RandomForestClassificationSummary(_ClassificationSummary):
class RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,
class BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):
class BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,
trait TimestampFormatterHelper extends TimeZoneAwareExpression
case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
case class DayOfYear(child: Expression) extends GetDateField
case class Year(child: Expression) extends GetDateField
case class YearOfWeek(child: Expression) extends GetDateField
case class Quarter(child: Expression) extends GetDateField
case class Month(child: Expression) extends GetDateField
case class DayOfMonth(child: Expression) extends GetDateField
case class DayOfWeek(child: Expression) extends GetDateField
case class WeekDay(child: Expression) extends GetDateField
case class WeekOfYear(child: Expression) extends GetDateField
case class CheckOverflowInSum(
case class WidthBucket(
trait PredicateHelper extends Logging
sealed abstract class MergeAction extends Expression with Unevaluable
case class DeleteAction(condition: Option[Expression]) extends MergeAction
case class TimeFormatters(date: DateFormatter, timestamp: TimestampFormatter)
case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan]
case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger
case class ContinuousTrigger(intervalMs: Long) extends Trigger
class StateStoreConf(

nchammas · 2020-07-02T17:47:53Z

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on 
project spark-parent_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: 
Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml: 
in epilog non whitespace content is not allowed but got t (position: END_TAG seen ...</metadata>\nt... @26:2) -> [Help 1]

Seems unrelated to this PR.

Retest this please.

nchammas · 2020-07-09T18:38:34Z

I think this PR is failing to garner enough interest. If there's nothing I can do at this time to make it more appealing, I'll close it. I think the idea is a good one, but it needs buy-in from a release manager.

srowen · 2020-07-09T20:01:18Z

Hm, I'm mostly in favor of it. Hyukjin if you're somewhat negative on it, OK let's shelve this. If you were fairly neutral, I think we can do this.

nchammas · 2020-08-19T18:03:54Z

Happy to revisit this when there is more support from a release manager.

nchammas added 4 commits March 16, 2020 13:11

missing test dependencies

7a20608

add docs requirements file

7273851

reference new docs requirements file

0a9cbbc

pin doc gem requirements in github workflow

9c3a7af

nchammas commented Mar 16, 2020

View reviewed changes

docs/requirements.txt Outdated Show resolved Hide resolved

HyukjinKwon reviewed Mar 17, 2020

View reviewed changes

dongjoon-hyun added the BUILD label Mar 17, 2020

nchammas added 3 commits March 17, 2020 10:02

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

de9fbbd

…1167-missing-test-deps

merge docs reqs into general dev reqs

1086d81

merged reqs

cd05e25

nchammas commented Mar 17, 2020

View reviewed changes

nchammas added 2 commits March 20, 2020 15:32

open python reqs to latest patch version

4326933

Unidecode 04 -> 4

75682a6

srowen reviewed Mar 22, 2020

View reviewed changes

HyukjinKwon mentioned this pull request Mar 24, 2020

[SPARK-31231][BUILD] Explicitly setuptools version as 46.0.0 in pip package test #27995

Closed

HyukjinKwon reviewed Mar 30, 2020

View reviewed changes

docs/README.md Show resolved Hide resolved

HyukjinKwon mentioned this pull request Apr 1, 2020

[SPARK-30879][DOCS] Refine workflow for building docs #27534

Closed

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

9704c60

…1167-missing-test-deps

probot-autolabeler bot added DOCS INFRA labels Apr 17, 2020

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

7df0040

…1167-missing-test-deps

nchammas commented May 4, 2020

View reviewed changes

dev/README.md Show resolved Hide resolved

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

bae2c0c

…1167-missing-test-deps

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

e21d97d

…1167-missing-test-deps

nchammas closed this Aug 19, 2020

nchammas mentioned this pull request Aug 20, 2020

[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation #29491

Closed

nchammas mentioned this pull request Dec 14, 2023

[WIP][SPARK-46051][INFRA] Cache python deps for linter and documentation #43953

Closed

nchammas mentioned this pull request Dec 31, 2023

[WIP][SPARK-46549][INFRA] Cache the Python dependencies for SQL tests #44546

Closed

nchammas mentioned this pull request Mar 2, 2024

[SPARK-47151][PYTHON][PS][BUILD] Upgrade to pandas 2.2.1 #45236

Closed

		@@ -1,5 +1,10 @@
		flake8==3.5.0
		flake8==3.7.*

[SPARK-31167][BUILD] Refactor how we track Python test/build dependencies #27928

[SPARK-31167][BUILD] Refactor how we track Python test/build dependencies #27928

Uh oh!

Conversation

nchammas commented Mar 16, 2020 • edited by dongjoon-hyun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

nchammas commented Mar 16, 2020

Uh oh!

SparkQA commented Mar 16, 2020

Uh oh!

nchammas commented Mar 16, 2020

Uh oh!

SparkQA commented Mar 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Mar 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nchammas Mar 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nchammas Mar 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Mar 17, 2020

Uh oh!

SparkQA commented Mar 17, 2020

Uh oh!

nchammas commented Mar 17, 2020

Uh oh!

SparkQA commented Mar 17, 2020

Uh oh!

SparkQA commented Mar 20, 2020

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Mar 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nchammas commented Mar 23, 2020

Uh oh!

srowen commented Mar 29, 2020

Uh oh!

HyukjinKwon commented Mar 30, 2020

Uh oh!

Uh oh!

nchammas commented Apr 9, 2020

Uh oh!

nchammas commented Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nchammas commented Mar 16, 2020 •

edited by dongjoon-hyun

Loading

HyukjinKwon Mar 19, 2020 •

edited

Loading

nchammas Mar 20, 2020 •

edited

Loading

nchammas Mar 23, 2020 •

edited

Loading

HyukjinKwon commented Mar 23, 2020 •

edited

Loading

nchammas commented Apr 16, 2020 •

edited

Loading

nchammas commented Jul 2, 2020 •

edited

Loading