Skip to content

Conversation

felixbarny
Copy link
Member

Removes the feature flag for the tsid hashing improvements added in #132566.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine added v9.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

@felixbarny
Copy link
Member Author

Bubbling up a good question @henningandersen has raised:

I wonder about the case where we add a dynamic template that invalidates the index.dimensions field or vice versa. Would the tsid generation still be the same in the two cases?

In that case, we’d indeed create a new tsid.

So far, we’ve been considering that to be acceptable as it’s only happening in an edge case. But we didn’t think about the implications on translog replay:

Scenario A
A document might be put in the translog that has an id which is based on the index.dimensions-based tsid. If a dynamic template that adds dimension fields gets added, the index.dimensions setting gets removed and sets index.routing_path setting is added instead. That’s happening here:

if (matchesAllDimensions == false) {
// If the new dimensions don't match all potential dimension fields, we need to unset index.dimensions
// and set index.routing_path instead.
// This can happen if a new dynamic template with time_series_dimension: true is added to an existing index.
additionalSettings.putList(INDEX_DIMENSIONS.getKey(), List.of());
additionalSettings.putList(INDEX_ROUTING_PATH.getKey(), newIndexDimensions);

When the translog operation is replayed, it would then create a tsid based on index.routing_path and therefore end up with a different id. This would trip this check and fail the ingestion:

if (context.sourceToParse().id() != null && false == context.sourceToParse().id().equals(id)) {
throw new IllegalArgumentException(
String.format(
Locale.ROOT,
"_id must be unset or set to [%s] but was [%s] because [%s] is in time_series mode",
id,
context.sourceToParse().id(),
context.indexSettings().getIndexMetadata().getIndex().getName()
)
);
}
context.id(id);


Scenario B
I think there are other cases where a similar thing can happen today, even without the new index.dimensions-based tsid generation. The scenario involves starting with non-dynamic mapping (dynamic: false) and then adding a dynamic sub field that defines dimensions.

Click to expand ...

creates a time series index with a single dimension and disables dynamic mappings

PUT tsdb-test
{
  "settings": {
    "index.mode": "time_series",
    "index.routing_path": "dim1",
    "index.time_series.start_time": "2025-01-01T00:00:00",
    "index.time_series.end_time": "2025-01-02T00:00:00"
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "dim1": {
        "type": "keyword",
        "time_series_dimension": true
      }
    }
  }
}

# adds a document with a labels.* object, which gets ignored, because dynamic mappings are disabled
POST tsdb-test/_doc
{
  "@timestamp": "2025",
  "dim1": "foo",
  "labels": {
    "foo": "bar"
  }
}

# document gets added to the translog, which stores the id (which is derived from the tsid) but not the tsid

# enable dynamic mappings for labels.* and map them as dimensions
PUT tsdb-test/_mapping
{
  "properties": {
    "labels": {
      "dynamic": true,
      "type": "passthrough",
      "priority": 1,
      "time_series_dimension": true
    }
  }
}

# if the translog gets replayed now, the tsid is re-computed based on dim1 and labels.foo
# the tsid and therefore the id will be different than the one from the translog, causing ingestion to fail

# re-ingesting the same document succeeds and isn't rejected as a duplicate as the tsid and id is based not only on dim1 but also on labels.foo
POST tsdb-test/_doc
{
  "@timestamp": "2025",
  "dim1": "foo",
  "labels": {
    "foo": "bar"
  }
}

To me, the bottom line is that we should also store the tsid in the translog. If we don't, there can be weird situations where the re-computed id differs from the one in the translog as the dimension mappings have changed. As a side-effect, translog operations will be more efficient, as we don't need to re-compute the tsid. However, it sounds like this would take some time and that we likely won't be able to do this for 9.2.

Removing this feature flag as-is would expose us to a new failure scenario compared to what we have today (scenario A).

A potential short-term alternative is to disallow adding dynamic template for dimensions to the mappings so that the change can only happing after a rollover. However, this would be a breaking change and fix only scenario A

@felixbarny
Copy link
Member Author

Here's a draft PR that avoids changing the tsid creation strategy from index.dimensions to index.routing_path: #135514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :StorageEngine/TSDB You know, for Metrics Team:StorageEngine v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants