这可以通过使用来实现脚本字段 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html /painless
script.
您可以计算每个字段的出现次数并将其添加到文档中。
Example:
## Here's my test index with some sample values
POST t1/doc/1 <-- this has one occurence
{
"content" : "my test phrase"
}
POST t1/doc/2 <-- this document has 5 occurences
{
"content": "my test phrase ",
"content1" : "this is my test phrase 1",
"content2" : "this is my test phrase 2",
"content3" : "this is my test phrase 3",
"content4" : "this is my test phrase 4"
}
POST t1/doc/3
{
"content" : "my test new phrase"
}
现在使用脚本我可以计算每个字段的短语匹配数。我对每个字段计数一次,但您可以修改脚本以每个字段进行多次匹配。
显然,这里的缺点是您需要提及脚本中文档中的每个字段,除非有一种方法可以循环我不知道的文档字段。
POST t1/_search
{
"script_fields": {
"phrase_Count": {
"script": {
"lang": "painless",
"source": """
int count = 0;
if(doc['content.keyword'].size() > 0 && doc['content.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content1.keyword'].size() > 0 && doc['content1.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content2.keyword'].size() > 0 && doc['content2.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content3.keyword'].size() > 0 && doc['content3.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content4.keyword'].size() > 0 && doc['content4.keyword'].value.indexOf(params.phrase)!=-1) count++;
return count;
""",
"params": {
"phrase": "my test phrase"
}
}
}
}
}
这将为我提供每个文档的短语计数作为脚本字段
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
5 <--- count of occurrences of the phrase in the document
]
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
1
]
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
0
]
}
}
]
}
}