-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:StorageEngine/MappingThe storage related side of mappingsThe storage related side of mappings>enhancementMetaTeam:StorageEngine
Description
This shrinks the index by implementing a "synthetic" _source field. Instead of saving the field to disk we reconstruct it on the fly using our column store, doc values.
Before removing the feature flag
- Initial implementation Synthetic source #85649
- Figure out how much performance we'd get from using synthetic source for recovery - thus removing the `_recovery_source field. Synthetic source #85649 (comment)
- Figure out final API to turn this on (@qhoxie)
- If it stays
synthetic: truethen we'll have to fail mappings that containenabled: false, synthetic: trueSynthetic source: don't allow disabling source #87270 - Flip API from
synthetic: truetosynthetic: strictso we have room to add more later. We totally will. @romseygeek - Rally track Support synthetic source in tsdb and nyc_taxis tracks rally-tracks#268 + Add fetch operations for synthetic source rally-tracks#270
- Resolve remaining round trip tests Synethetic Source: Fix scaled float #86760
- Realtime GET should synthesize on load to return consistent results. (Tests for synthetic _source from translog #87578)
- Add an option to _search to simulate synthetic source (Add an option to _search to force synthetic source #87068)
- Support the simulate option on GET and MGET (Add force_synthetic_source to GET #87536 + Add force_synthetic_source to mget #87574)
- Make sure we throw an error if you try to enable or disable synthetic source on an index Synthetic source: paranoid tests for configuration #87182
- Support "subobjects" Add support for dots in field names for metrics usecases #86166 (Synthetic source: tests for disabling subobjects #87261)
- Docs Docs for synthetic source #87416
- Figure out highlighting Fixup highlighting with synthetic source #87667
Later
- Make sure there is a nice error message when scripts try to access synethic source - it won't be there. They should use doc values or the fancy new fields API. @tmgordeeva Synthetic source error on script loads #88334
- Add support for more field types
-
aggregate_metric_doublefield type Addsynthetic_sourcesupport toaggregate_metric_doublefields #88909 -
constant_keywordEnable synthetic source support on constant keyword fields #88603 -
dense_vectorSynthetic _source: support dense_vector #89840 -
histogramSynthetic _source: support histogram field #89833 -
keywordfields withignore_above(Synthetic source: load text from stored fields #87480 + Synthetic _source: support ignore_above #89466) -
match_only_textSynthetic _source: support match_only_text #89516 -
versionSynthetic _source: support version field type #89706 -
_doc_countSupport synthetic _source for _doc_count field #91465
-
- Support
fieldsin runtime fields scripts- Numbers REST tests fetching fields with synthetic _source #89888
-
ipREST tests fetching fields with synthetic _source #89888 -
text(Synthetic _source: supportfieldin many cases #89950 + more) -
keyword(Synthetic _source: supportfieldin many cases #89950) -
match_only_text(Synthetic _source: supportfieldin many cases #89950)
- Support loading from stored fields (text would love it!) Synthetic source: load text from stored fields #87480
- Rally tests for random fetch
- Look into the
enrichprocessor (More tests for enrich processor #89554) - Improve performance of synthesis
- General Speed up synthetic source #87882
- Load column-wise Speed up synthetic keyword, ip, and text fields #87930 Synthetic source numbers in columns #88025
- Parallel loading?
- Make
fieldsAPI aware of synthetic-ness and go to doc values rather than rebuilding_sourceif_sourceisn't separately needed. - Document best practices for load over synthetic source
- Support for
ignore_malformedSynthetic _source: support ignore_malformed #90007-
ipSynthetic_source:ignore_malformedforip#90038 -
numericSupport malformed numbers in synthetic _source #90428 -
scaled_floatSupport synthetic source for scaled_float and unsigned_long when ignore_malformed is used #109506 -
date/date_nanosSupport synthetic source for date fields when ignore_malformed is used #109410 -
geo_pointSupport synthetic source for geo_point when ignore_malformed is used #109651 -
histogramSupport synthetic source together with ignore_malformed in histogram fields #109882 -
aggregate_double_metricSupport synthetic source for aggregate_metric_double when ignore_malformed is used #108746 Simplify ignore_malformed handling for synthetic souce in aggregate_metric_double #109888 -
texttype family Text fields are stored by default in TSDB indices #106338 -
keywordSynthetic _source: support ignore_above #89466
-
Much later
- Synthesize instead of using
_recovery_source- we find that it'd improve write performance by ~11%. We'd have to synthesize on load instead. That's pretty slow. We'd love the 11% but we have to be careful here.
UkrZillawchaparro, martijnvg, salvatore-campagna and rocco8620
Metadata
Metadata
Assignees
Labels
:StorageEngine/MappingThe storage related side of mappingsThe storage related side of mappings>enhancementMetaTeam:StorageEngine
