elasticsearch copy_to 字段的聚合行为不符合预期

2024-03-03

我有一个带有两个字符串字段的索引映射,field1 and field2,两者都被声明为 copy_to 到另一个名为all_fields. all_fields索引为“not_analyzed”。

当我创建存储桶聚合时all_fields,我期待不同的存储桶,其中 field1 和 field2 的键连接在一起。相反,我得到了单独的存储桶,其中 field1 和 field2 的键未连接。

例子: 映射:

  {
    "mappings": {
      "myobject": {
        "properties": {
          "field1": {
            "type": "string",
            "index": "analyzed",
            "copy_to": "all_fields"
          },
          "field2": {
            "type": "string",
            "index": "analyzed",
            "copy_to": "all_fields"
          },
          "all_fields": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }

data in:

  {
    "field1": "dinner carrot potato broccoli",
    "field2": "something here",
  }

and

  {
    "field1": "fish chicken something",
    "field2": "dinner",
  }

聚合:

{
  "aggs": {
    "t": {
      "terms": {
        "field": "all_fields"
      }
    }
  }
}

results:

...
"aggregations": {
    "t": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "dinner",
                "doc_count": 1
            },
            {
                "key": "dinner carrot potato broccoli",
                "doc_count": 1
            },
            {
                "key": "fish chicken something",
                "doc_count": 1
            },
            {
                "key": "something here",
                "doc_count": 1
            }
        ]
    }
}

我原本以为只有2桶,fish chicken somethingdinner and dinner carrot potato broccolisomethinghere

我究竟做错了什么?


您正在寻找的是两个字符串的串联。copy_to即使它看起来正在这样做,但事实并非如此。和copy_to从概念上讲,您正在从两者创建一组值field1 and field2,而不是连接它们。

对于您的用例,您有两种选择:

  1. use _source转型 https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-transform.html#mapping-transform
  2. 执行脚本聚合

我会推荐_source转换,因为我认为这比编写脚本更有效。这意味着,与进行繁重的脚本聚合相比,您在索引时付出的代价很小。

For _source转型:

PUT /lastseen
{
  "mappings": {
    "test": {
      "transform": {
        "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']"
      }, 
      "properties": {
        "field1": {
          "type": "string"
        },
        "field2": {
          "type": "string"
        },
        "lastseen": {
          "type": "long"
        },
        "all_fields": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

和查询:

GET /lastseen/test/_search
{
  "aggs": {
    "NAME": {
      "terms": {
        "field": "all_fields",
        "size": 10
      }
    }
  }
}

For 脚本聚合,更容易做到(意思是,使用doc['field'].value而不是更贵的_source.field) add .raw子字段到field1 and field2:

PUT /lastseen
{
  "mappings": {
    "test": { 
      "properties": {
        "field1": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },
        "field2": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },
        "lastseen": {
          "type": "long"
        }
      }
    }
  }
}

脚本将使用这些.raw子字段:

{
  "aggs": {
    "NAME": {
      "terms": {
        "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", 
        "size": 10,
        "lang": "groovy"
      }
    }
  }
}

如果没有.raw子字段(故意创建的not_analyzed)你需要做这样的事情,这是更昂贵的:

{
  "aggs": {
    "NAME": {
      "terms": {
        "script": "_source.field1 + ' ' + _source.field2", 
        "size": 10,
        "lang": "groovy"
      }
    }
  }
}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

elasticsearch copy_to 字段的聚合行为不符合预期 的相关文章

随机推荐