-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version: 6.2.4
Plugins installed: []
JVM version: 1.8.0_172
OS version: MacOS (Darwin Kernel Version 15.6.0)
Description of the problem including expected versus actual behavior:
Over the past few months, we've been seeing completely identical documents pop up which have the same id, type and routing id. We're using custom routing to get parent-child joins working correctly and we make sure to delete the existing documents when re-indexing them to avoid two copies of the same document on the same shard. We use Bulk Index API calls to delete and index the documents. The indexTime field below is set by the service that indexes the document into ES and as you can see, the documents were indexed about 1 second apart from each other. This problem only seems to happen on our production server which has more traffic and 1 read replica, and it's only ever 2 documents that are duplicated on what I believe to be a single shard.
The problem can be fixed by deleting the existing documents with that id and re-indexing it again which is weird since that is what the indexing service is doing in the first place.
Queries:
GET /my-index/_search
{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"field": "id",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}
{
"took": 2588,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 15430904,
"max_score": 0,
"hits": []
},
"aggregations": {
"duplicateCount": {
"doc_count_error_upper_bound": 4,
"sum_other_doc_count": 15430801,
"buckets": [
{
"key": "746004ff8168bbe5672605fad34704a5",
"doc_count": 2,
"duplicateDocuments": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "my-index",
"_type": "ce",
"_id": "746004ff8168bbe5672605fad34704a5",
"_score": 1,
"_routing": "746004ff8168bbe5672605fad34704a5",
"_source": {
"indexTime": 1531249623788
}
},
{
"_index": "my-index",
"_type": "ce",
"_id": "746004ff8168bbe5672605fad34704a5",
"_score": 1,
"_routing": "746004ff8168bbe5672605fad34704a5",
"_source": {
"indexTime": 1531249622605
}
}
]
}
}
}
]
}
}
}