Skip to content

RegressionIT. testTwoJobsWithSameRandomizeSeedUseSameTrainingSet fails on 7.x #55807

@przemekwitek

Description

@przemekwitek

Build 20200427110658-3288FBEC
Log https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob-unix-compatibility/os=ubuntu-18.04&&immutable/715/console
Build Scans
[7.7.0] https://gradle-enterprise.elastic.co/s/wod4ndjqklih6
[7.6.3] https://gradle-enterprise.elastic.co/s/vrpyflwqudi36
[6.8.9] https://gradle-enterprise.elastic.co/s/mppjhffgu3jg2
[6.8.9] https://gradle-enterprise.elastic.co/s/mwcdcz4fqnbem
[7.7.0] https://gradle-enterprise.elastic.co/s/pu2t7imvl6ygg
[7.6.3] https://gradle-enterprise.elastic.co/s/mn5lnrsmd77sm
https://gradle-enterprise.elastic.co/s/es3gj7z7irvuc

Repro lines:

REPRODUCE WITH: ./gradlew ':x-pack:plugin:ml:qa:native-multi-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.ml.integration.RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet" \
  -Dtests.seed=2EAE3FA5F6E57FDA \
  -Dtests.security.manager=true \
  -Dtests.locale=is \
  -Dtests.timezone=Australia/Darwin \
  -Dcompiler.java=14 \
  -Druntime.java=8

REPRODUCE WITH: ./gradlew ':x-pack:plugin:ml:qa:native-multi-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.ml.integration.RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet" \
  -Dtests.seed=2EAE3FA5F6E57FDA \
  -Dtests.security.manager=true \
  -Dtests.locale=is \
  -Dtests.timezone=Australia/Darwin \
  -Dcompiler.java=14 \
  -Druntime.java=8

Logs excerpt:

ElasticsearchException[all shards failed]; nested: SearchPhaseExecutionException[all shards failed];
	at __randomizedtesting.SeedInfo.seed([2EAE3FA5F6E57FDA:6BA53EC32EE23820]:0)
	at org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper.serverError(ExceptionsHelper.java:55)
	at org.elasticsearch.xpack.ml.action.TransportGetDataFrameAnalyticsStatsAction.lambda$searchStats$8(TransportGetDataFrameAnalyticsStatsAction.java:223)
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:76)
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)
	at org.elasticsearch.action.search.TransportMultiSearchAction$1.finish(TransportMultiSearchAction.java:178)
	at org.elasticsearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:164)
	at org.elasticsearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:157)
	at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:92)
	at org.elasticsearch.action.support.ContextPreservingActionListener.onFailure(ContextPreservingActionListener.java:50)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:571)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:580)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:68)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:245)
	at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73)
	at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59)
	at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:402)
	at org.elasticsearch.transport.TransportService$6.handleException(TransportService.java:639)
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1171)
	at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:244)
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:226)
	at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:242)
	at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:234)
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:131)
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:94)
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:698)
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142)
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117)
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:73)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:227)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1470)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(Thread.java:748)
Caused by: Failed to execute phase [query], all shards failed
	... 50 more

testSetUpgradeMode_ExistingTaskGetsUnassigned was suffering from similar failure (see #55221). The workaround was to wrap stats action call in assertBusy so that intermittent shard failures are retried. There was also suggestion that it may fail due to one of .ml-state* or .ml-stats* indices being unavailable.

Metadata

Metadata

Assignees

Labels

:mlMachine learning>test-failureTriaged test failures from CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions