Skip to content

ES|QL NULL check in CASE clause breaks later WHERE #112704

@tomhe

Description

@tomhe

Elasticsearch Version

8.15.1

Installed Plugins

No response

Java Version

bundled

OS Version

Docker

Problem Description

Using a WHERE statement after a CASE clause gives very strange and unexpected behavior. It's somewhat hard to explain, but the steps to reproduce should explain the situation well.

Steps to Reproduce

  1. Bring up a fresh Docker cluster from scratch, running 8.15.1
  2. Ingest a few documents:
    POST _bulk
    {"index":{"_index":"test"}}
    {"val": 1}
    {"index":{"_index":"test"}}
    {"val": 2}
    {"index":{"_index":"test"}}
    {"val": 3}
    {"index":{"_index":"test"}}
    {"val": 4}
    {"index":{"_index":"test"}}
    {} // This doc has no value for val
    
  3. Run a query that includes a NULL check in the CASE clause:
    POST /_query?format=txt
    {
    "query": """
        FROM test
        | KEEP val
        | EVAL eval = CASE(
            val == 0, null,
            val % 2 == 0, "val is even",
            val % 2 != 0, "val is odd",
            val IS NULL, "val is null")
        | SORT val
    """
    }
    
  4. Observe that the output includes all documents and that the eval column has a value for all rows but the first:
    #! No limit defined, adding default limit of [1000]
        val      |     eval
    ---------------+---------------
    0              |null
    1              |val is odd
    2              |val is even
    3              |val is odd
    4              |val is even
    null           |val is null
    
  5. Add a WHERE clause that operates on the eval column to only include rows where eval is not null:
    POST /_query?format=txt
    {
    "query": """
        FROM test
        | KEEP val
        | EVAL eval = CASE(
            val == 0, null,
            val % 2 == 0, "val is even",
            val % 2 != 0, "val is odd",
            val IS NULL, "val is null")
        | SORT val
        | WHERE eval IS NOT NULL
    """
    }
    
  6. Observe that the row where val is 0 and eval is null is correctly excluded, but that the row where val is null and eval is val is null is (incorrectly) missing:
    #! No limit defined, adding default limit of [1000]
        val      |     eval
    ---------------+---------------
    1              |val is odd
    2              |val is even
    3              |val is odd
    4              |val is even
    
  7. Change the WHERE clause that operates on the eval column to use LIKE:
    POST /_query?format=txt
    {
    "query": """
        FROM test
        | KEEP val
        | EVAL eval = CASE(
            val == 0, null,
            val % 2 == 0, "val is even",
            val % 2 != 0, "val is odd",
            val IS NULL, "val is null")
        | SORT val
        | WHERE eval LIKE "*"
    """
    }
    
  8. Observe that the row for eval with the value val is null is still (incorrectly) missing:
    #! No limit defined, adding default limit of [1000]
        val      |     eval
    ---------------+---------------
    1              |val is odd
    2              |val is even
    3              |val is odd
    4              |val is even
    
  9. Change the WHERE clause that operates on the eval column to use a different LIKE pattern:
    POST /_query?format=txt
    {
    "query": """
        FROM test
        | KEEP val
        | EVAL eval = CASE(
            val == 0, null,
            val % 2 == 0, "val is even",
            val % 2 != 0, "val is odd",
            val IS NULL, "val is null")
        | SORT val
        | WHERE eval LIKE "value*"
    """
    }
    
  10. Observe that the row for eval with the value val is null now shows correctly:
    #! No limit defined, adding default limit of [1000]
        val      |     eval
    ---------------+---------------
    1              |val is odd
    2              |val is even
    3              |val is odd
    4              |val is even
    null           |val is null
    
  11. Revert back to the query used in step 5, but add a LIMIT statement somewhere before the WHERE statement:
    POST /_query?format=txt
    {
    "query": """
        FROM test
        | KEEP val
        | EVAL eval = CASE(
            val == 0, null,
            val % 2 == 0, "val is even",
            val % 2 != 0, "val is odd",
            val IS NULL, "val is null")
        | SORT val
        | LIMIT 10
        | WHERE eval IS NOT NULL
    """
    }
    
  12. Observe that the row for eval with the value val is null now shows correctly (where it did not show in step 6):
        val      |     eval
    ---------------+---------------
    1              |val is odd
    2              |val is even
    3              |val is odd
    4              |val is even
    null           |val is null
    

Conclusion:

The best workaround seems to be to use a LIKE statement with some pattern other than "*" that matches all expected values (which is not always possible). Using the LIMIT fix will not work for large datasets.

Logs (if relevant)

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions