Elasticsearch - 每个文档的匹配数

2024-03-05

我使用此查询来搜索字段中出现的短语。

"query": {
    "match_phrase": {
       "content": "my test phrase"
  }
 }

我需要计算每个文档的每个短语发生了多少次匹配(如果这可能的话?)

我考虑过聚合器,但认为它们不满足要求,因为它们会给我整个索引而不是每个文档的匹配数。

Thanks.


这可以通过使用来实现脚本字段 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html /painless script.

您可以计算每个字段的出现次数并将其添加到文档中。

Example:

## Here's my test index with some sample values

POST t1/doc/1  <-- this has one occurence
{
  "content" : "my test phrase"
}

POST t1/doc/2    <-- this document has 5 occurences
{
   "content": "my test phrase ",
   "content1" : "this is my test phrase 1",
   "content2" : "this is my test phrase 2",
   "content3" : "this is my test phrase 3",
   "content4" : "this is my test phrase 4"

}

POST t1/doc/3
{
  "content" : "my test new phrase"
}

现在使用脚本我可以计算每个字段的短语匹配数。我对每个字段计数一次,但您可以修改脚本以每个字段进行多次匹配。

显然,这里的缺点是您需要提及脚本中文档中的每个字段,除非有一种方法可以循环我不知道的文档字段。

POST t1/_search
{
  "script_fields": {
    "phrase_Count": {
      "script": {
        "lang": "painless",
        "source": """
                             int count = 0;

                            if(doc['content.keyword'].size() > 0 && doc['content.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content1.keyword'].size() > 0 && doc['content1.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content2.keyword'].size() > 0 && doc['content2.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content3.keyword'].size() > 0 && doc['content3.keyword'].value.indexOf(params.phrase)!=-1) count++;
                            if(doc['content4.keyword'].size() > 0 && doc['content4.keyword'].value.indexOf(params.phrase)!=-1) count++;

                            return count;
""",
        "params": {
          "phrase": "my test phrase"
        }
      }
    }
  }
}

这将为我提供每个文档的短语计数作为脚本字段

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "fields" : {
          "phrase_Count" : [
            5                 <--- count of occurrences of the phrase in the document
          ]
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "phrase_Count" : [
            1
          ]
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "fields" : {
          "phrase_Count" : [
            0
          ]
        }
      }
    ]
  }
}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Elasticsearch - 每个文档的匹配数 的相关文章

随机推荐