Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jul 23, 2018

What changes were proposed in this pull request?

Currently, looks we hit the time limit time to time. Looks better increasing the time a bit.

For instance, please see #21822

For clarification, current Jenkins timeout is 400m. This PR just proposes to fix the test script to increase it correspondingly.

This PR does not target to change the build configuration

How was this patch tested?

Jenkins tests.

@HyukjinKwon
Copy link
Member Author

cc @rxin

@HyukjinKwon HyukjinKwon changed the title [SPARK-24886][INFRA] Fix the testing script to increase timeout for Jenkins build (from 300m to 350m) [SPARK-24886][INFRA] Fix the testing script to increase timeout for Jenkins build (from 300m to 330m) Jul 23, 2018
@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93429 has finished for PR 21845 at commit c57b745.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93430 has finished for PR 21845 at commit 7afc5c5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@hvanhovell
Copy link
Contributor

@HyukjinKwon do we have any idea why we are hitting a timeout?

@HyukjinKwon
Copy link
Member Author

I am not really sure on that. I asked the same question before and got no answer before. Just vaguely roughly guess there's something wrong in Jenkins cluster - I have roughly been kind of keen to check build time and to me seems suddenly increased (in some cases or some machines(?)).

@HyukjinKwon
Copy link
Member Author

Just given observation for the builds in #21822, most of timeouts looked happened in amp-jenkins-worker-06 machine FWIW.

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93436 has finished for PR 21845 at commit 7afc5c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 23, 2018

This helps, but it is not sustainable to keep increasing the threshold. What we need to do is to look at test time distribution and figure out what test suites are unnecessarily long and actually cut down the time there. @HyukjinKwon Would you be interested in doing that?

@HyukjinKwon
Copy link
Member Author

of course i am as usual. I actually already have been being taking care of it. Thing is the tests are just being added even if they are duplicated of something. I feel like it's a bit excessive so far. In genetal, I don't think there are particular tests especially taking a lot of time IMHO. What we should do is that we put some efforts to deduplicate the tests.

@HyukjinKwon
Copy link
Member Author

@rxin, btw you want me close this one or get this in? Will take a look for the build and tests thing again during this week for sure anyway.

@rxin
Copy link
Contributor

rxin commented Jul 24, 2018 via email

@HyukjinKwon
Copy link
Member Author

Yup, looks so in your PR #21822 (comment)

@rxin
Copy link
Contributor

rxin commented Jul 24, 2018 via email

@HyukjinKwon
Copy link
Member Author

Hm, yea then. I actually opened this PR to make the tests passed in your PR. Let me leave this closed then and reopen when we hit the issue next time.

@dilipbiswal
Copy link
Contributor

@HyukjinKwon I saw the following test run for 11 minutes on jenkins for one of my PR. Not sure if its a transient problem. Just thought, i should let you know. On the nightly runs, should we have test that runs for that long ?

SPARK-22499: Least and greatest should not generate codes beyond 64KB (11 minutes, 38 seconds)

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jul 24, 2018

Ah, of course we shouldn't. I roughly checked few builds in #21822 and made a fix for the test #21855

Thanks for letting me know.

@dilipbiswal
Copy link
Contributor

@HyukjinKwon Super. Thanks a lot for fixing.

@HyukjinKwon
Copy link
Member Author

I am reopening this per #21898 (comment)

cc @cloud-fan, @rxin and @shaneknapp

@cloud-fan
Copy link
Contributor

Ideally we should figure out which tests take abnormal long time and fix them. But I'd like to increase the timeout first, if #21898 keeps hitting timeout. #21898 is an important feature to Spark 2.4, and we should not block it for some infra problems. cc @rxin

@SparkQA
Copy link

SparkQA commented Aug 7, 2018

Test build #94345 has finished for PR 21845 at commit 7afc5c5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@HyukjinKwon
Copy link
Member Author

@SparkQA
Copy link

SparkQA commented Aug 7, 2018

Test build #94349 has finished for PR 21845 at commit 7afc5c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

i'm also more than happy to bump the timeout in the PRB build, but i think that's just putting duct tape on a band-aid and spray painting it to hide the layers of tape.

the builds and tests just take too long. i know that solving this problem is far beyond the scope of this PR, but build duration really needs some attention.

# format: http://linux.die.net/man/1/timeout
# must be less than the timeout configured on Jenkins (currently 350m)
tests_timeout = "300m"
# must be less than the timeout configured on Jenkins (currently 400m)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Copy link
Member Author

Let me push this in late tonight or early tomorrow.

@SparkQA
Copy link

SparkQA commented Aug 8, 2018

Test build #94396 has finished for PR 21845 at commit 08b4ebe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

I am getting this in. We are seeing more - #22011 (comment)

@HyukjinKwon HyukjinKwon changed the title [SPARK-24886][INFRA] Fix the testing script to increase timeout for Jenkins build (from 300m to 330m) [SPARK-24886][INFRA] Fix the testing script to increase timeout for Jenkins build (from 300m to 340m) Aug 10, 2018
@HyukjinKwon
Copy link
Member Author

Increased 330 -> 340 since even 330 looks not enough.

@HyukjinKwon
Copy link
Member Author

Merged to master.

@asfgit asfgit closed this in 6c7bb57 Aug 10, 2018
@SparkQA
Copy link

SparkQA commented Aug 10, 2018

Test build #94533 has finished for PR 21845 at commit 51f8792.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# must be less than the timeout configured on Jenkins (currently 350m)
tests_timeout = "300m"
# must be less than the timeout configured on Jenkins (currently 400m)
tests_timeout = "340m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're STILL seeing test timeouts. let's bump this to 400m and i'll up the timeout in jenkins to 430m.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah .. got it ..

asfgit pushed a commit that referenced this pull request Aug 18, 2018
…enkins build (from 340m to 400m)

## What changes were proposed in this pull request?

This PR targets to increase the timeout from 340 to 400m. Please also see #21845 (comment)

## How was this patch tested?

N/A

Closes #22098 from HyukjinKwon/SPARK-24886-1.

Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-24886 branch October 16, 2018 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants