Skip to content

Reindex from remote doesn't work with master's join module #25363

@nik9000

Description

@nik9000

Run something like this:

DELETE /source

PUT /source
{
  "mappings": {
    "doc": {
      "properties": {
        "join_field": {
          "type": "join",
          "relations": {
            "parent": "child",
            "child": "grand_child"
          }
        }
      }
    }
  }
}

DELETE /dest

PUT /dest
{
  "mappings": {
    "doc": {
      "properties": {
        "join_field": {
          "type": "join",
          "relations": {
            "parent": "child",
            "child": "grand_child"
          }
        }
      }
    }
  }
}


PUT /source/doc/1
{ "join_field": { "name": "parent" } }

PUT /source/doc/2?routing=1
{ "join_field": { "name": "child", "parent": "1" } }

PUT /source/doc/3?routing=1
{ "join_field": { "name": "grand_child", "parent": "2" } }

POST /_refresh

POST /_reindex?refresh
{
  "source": {
    "index": "source",
    "remote": {
      "host": "http://127.0.0.1:9200"
    }
  },
  "dest": {
    "index": "dest"
  }
}

And it'll blow up with a big stack trace that comes down to:

            "reason": "[fields] unknown field [join_field], parser not found"

This is because join always returns the join field whether you ask for it or not:

POST /source/_search

returns:

{
  ...
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "source",
        "_type": "doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "join_field": {
            "name": "parent"
          }
        },
        "fields": {
          "join_field": [    <---- This
            "parent"
          ]
        }
      },
      {
        "_index": "source",
        "_type": "doc",
        "_id": "2",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "join_field": {
            "name": "child",
            "parent": "1"
          }
        },
        "fields": {
          "join_field#parent": [    <---- This
            "1"
          ],
          "join_field": [    <---- This
            "child"
          ]
        }
      },
      {
        "_index": "source",
        "_type": "doc",
        "_id": "3",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "join_field": {
            "name": "grand_child",
            "parent": "2"
          }
        },
        "fields": {
          "join_field": [    <---- This
            "grand_child"
          ],
          "join_field#child": [    <---- This
            "2"
          ]
        }
      }
    ]
  }
}

Reindex can fix this on its side by ignoring fields it doesn't know about or by doing more complex things like checking the source mapping first. I wonder if a better solution is for join not to return the field if it wasn't asked for. I don't believe reindex needs the field to do its job.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/CRUDA catch all label for issues around indexing, updating and getting a doc by id. Not search.:Search/SearchSearch-related issues that do not fall into other categories>bugblockerv6.0.0-beta1

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions