Skip to content

RegressionIT. testTwoJobsWithSameRandomizeSeedUseSameTrainingSet fails on master #53188

@jbaiera

Description

@jbaiera

This failure came in today running on the master branch. I was able to reproduce the failure locally 100% of the runs I tried:

org.elasticsearch.xpack.ml.integration.RegressionIT > testTwoJobsWithSameRandomizeSeedUseSameTrainingSet {seed=[FC6B2FF692026E68:60F2C02B52CBCBBF]} FAILED
    org.elasticsearch.ElasticsearchStatusException: Could not start data frame analytics task, allocation explanation [Not opening data frame analytics job [regression_two_jobs_with_same_randomize_seed_2], because not all primary shards are active for the following indices [.ml-state-000001]]
        at __randomizedtesting.SeedInfo.seed([FC6B2FF692026E68:60F2C02B52CBCBBF]:0)
        at org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction$AnalyticsPredicate.test(TransportStartDataFrameAnalyticsAction.java:457)
        at org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction$AnalyticsPredicate.test(TransportStartDataFrameAnalyticsAction.java:436)
        at org.elasticsearch.persistent.PersistentTasksService.lambda$waitForPersistentTaskCondition$1(PersistentTasksService.java:153)
        at org.elasticsearch.persistent.PersistentTasksService.waitForPersistentTaskCondition(PersistentTasksService.java:157)
        at org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction.waitForAnalyticsStarted(TransportStartDataFrameAnalyticsAction.java:390)
        at org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction$1.onResponse(TransportStartDataFrameAnalyticsAction.java:167)
        at org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction$1.onResponse(TransportStartDataFrameAnalyticsAction.java:164)
        at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:163)
        at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:163)
        at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)
        at org.elasticsearch.client.node.NodeClient.lambda$executeLocally$0(NodeClient.java:97)
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:144)
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:138)
        at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)
        at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:89)
        at org.elasticsearch.persistent.StartPersistentTaskAction$TransportAction.lambda$masterOperation$0(StartPersistentTaskAction.java:213)
        at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113)
        at org.elasticsearch.persistent.PersistentTasksClusterService$1.clusterStateProcessed(PersistentTasksClusterService.java:121)
        at org.elasticsearch.cluster.service.MasterService$SafeClusterStateTaskListener.clusterStateProcessed(MasterService.java:529)
        at org.elasticsearch.cluster.service.MasterService$TaskOutputs.lambda$processedDifferentClusterState$1(MasterService.java:416)
        at java.util.ArrayList.forEach(ArrayList.java:1540)
        at org.elasticsearch.cluster.service.MasterService$TaskOutputs.processedDifferentClusterState(MasterService.java:416)
        at org.elasticsearch.cluster.service.MasterService.onPublicationSuccess(MasterService.java:276)
        at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:268)
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:245)
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151)
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.lang.Thread.run(Thread.java:834)

https://gradle-enterprise.elastic.co/s/6akvzoza2fmuc

Since it reproduces so consistently on my local run, I'm going to go ahead and mute the test.

Metadata

Metadata

Assignees

Labels

:mlMachine learning>test-failureTriaged test failures from CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions