close
The Wayback Machine - https://web.archive.org/web/20220426201027/https://github.com/elastic/elasticsearch/issues/83957
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to override keyed parameter in filters aggregation #83957

Open
tilman opened this issue Feb 15, 2022 · 3 comments
Open

Add ability to override keyed parameter in filters aggregation #83957

tilman opened this issue Feb 15, 2022 · 3 comments
Labels
:Analytics/Aggregations >bug good first issue Team:Analytics

Comments

@tilman
Copy link

@tilman tilman commented Feb 15, 2022

See #83957 (comment) for the reproduction and the description of the issue.

Original bug report:

Elasticsearch Version

6.4.3

Installed Plugins

/

Java Version

bundled

OS Version

linux

Problem Description

using the bucket_sort in a filters aggregation results in no sorting at all

Steps to Reproduce

Example query:

{
  "aggs": {
    "Category": {
      "filters": {
        "other_bucket": false,
        "other_bucket_key": "No Category",
        "filters": {
          "165422": {
            "term": {
              "product.categories.id": 165422
            }
          },
          "171229": {
            "term": {
              "product.categories.id": 171229
            }
          },
          "165523": {
            "term": {
              "product.categories.id": 165523
            }
          },
          "165530": {
            "term": {
              "product.categories.id": 165530
            }
          }
        }
      },
      "aggs": {
        "cancelation_count": {
          "sum": {
            "script": {
              "inline": "if(doc['order_cancellation_id'].value > 0){return 1} return 0;",
              "lang": "painless"
            }
          }
        },
        "sort_cancelation_count": {
          "bucket_sort": {
            "sort": [ "cancelation_count" ],
            "size": 4
          }
        }
      }
    }
  },
  "size": 0
}

Logs (if relevant)

Example result (filters aggregations are not sorted via the sum:cancelation_count but instead via the bucket key):

{
  "took": 620,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9580162,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "Category": {
      "buckets": {
        "165422": {
          "doc_count": 5835212,
          "cancelation_count": {
            "value": 145488
          }
        },
        "165523": {
          "doc_count": 1974874,
          "cancelation_count": {
            "value": 43586
          }
        },
        "165530": {
          "doc_count": 1231949,
          "cancelation_count": {
            "value": 33821
          }
        },
        "171229": {
          "doc_count": 193478,
          "cancelation_count": {
            "value": 5939
          }
        }
      }
    }
  }
}
@elasticmachine elasticmachine added the Team:Analytics label Mar 1, 2022
@elasticmachine
Copy link
Collaborator

@elasticmachine elasticmachine commented Mar 1, 2022

Pinging @elastic/es-analytics-geo (Team:Analytics)

@imotov
Copy link
Member

@imotov imotov commented Mar 7, 2022

Did you get this result directly from elasticsearch or through some client? I tried to reproduce it and it looks like elasticsearch returns correctly ordered list. But since the order of field in JSON is basically undefined, it makes sorting named buckets a bit of a moot point and any JSON post-processing will mess these values up anyway.

You can reliably sort buckets if they are in an array, but in case of filters aggregation this is only possible for anonymous filters and you will have to do something redundant like adding an additional terms agg to figure out which bucket corresponds to which key:

DELETE test
PUT test
{
  "mappings": {
    "properties": {
      "key": {
        "type": "keyword"
      }
    }
  }
}

PUT test/_bulk?refresh
{ "index": {} }
{ "key": "A", "val": 10}
{ "index": {} }
{ "key": "B", "val": 15}
{ "index": {} }
{ "key": "C", "val": 0}


GET test/_search
{
  "size": 0,
  "aggs": {
    "by_key": {
      "filters": {
        "filters": [
          {
            "term": {
              "key": "A"
            }
          },
          {
            "term": {
              "key": "B"
            }
          },
          {
            "term": {
              "key": "C"
            }
          }
        ]
      },
      "aggs": {
        "term": {
          "terms": {
            "field": "key"
          }
        },
        "max_val": {
          "max": {
            "field": "val"
          }
        },
        "sort_by_val": {
          "bucket_sort": {
            "sort": [
                {"max_val": "asc"}
            ]
          }
        }
      }
    }
  }
}

@elastic/es-analytics-geo I am adding this for team discuss to see if it makes sense to make keyed option user-configurable for cases like this.

@wchaparro wchaparro added good first issue and removed team-discuss labels Mar 30, 2022
@imotov imotov changed the title bucket_sort has no effect on filters aggregation Add ability to override keyed parameter in filters aggregation Mar 30, 2022
@imotov
Copy link
Member

@imotov imotov commented Mar 30, 2022

We have discussed it and decided that we should add an ability to override "keyed" parameter on this aggregation. In case of name filters we only supposed keyed output format that cannot be reliably supported by JSON since it doesn't guarantee the order of the elements in the object. If filters are anonymous, we return them as an array, but then sorting changes the order of elements making it impossible to know which result came from with filter. In this particular case we want named filters with non-keyed result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations >bug good first issue Team:Analytics
Projects
None yet
Development

No branches or pull requests

5 participants