Elasticsearch 使用带有同义词的 shingle 过滤器

2024-03-01

我有以下文件:

  • south africa
  • north africa

我想从以下位置检索我的“南非”文档:

  • s africa (a)
  • southafrica (b)
  • safrica (c)

我定义了以下过滤器和分析器:

POST test_index
{
  "settings": {
   "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "south,s",
            "north,n"
          ]
        },
        "shingle_filter": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "token_separator": ""
          }
      },
      "analyzer": {
        "my_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter"]
        },
        "my_shingle_synonym": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter", "synonym_filter"]
        },
        "my_synonym_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["synonym_filter", "shingle_filter"]
        }
    }
  } 
  },
  "mappings": {}
}

1) With 我的木瓦 south africa将被索引为south, southafrica, africa

2) With 我的木瓦同义词 south africa将被索引为south, s, southafrica, africa

3) With my_synonym_shingle south africa将被索引为south, souths, southsafrica, s, safrica, africa

So with

  • (1) 我会找到b

  • (2)我会找到a,b

  • (3)我会找到a,c

I want south africa被索引为:south, s, southafrica, safrica, africa


You do not必须根据您的要求输出所有可能的标记。您的问题可以通过使用不同的分析仪来解决多领域 https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html.

你会定义mapping像这样的你想要的领域。

"mappings": {
    "your_mapping": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "my_shingle",
          "fields": {
            "synonym": {
              "type": "string",
              "analyzer": "my_synonym_shingle"
            }
          }
        }
      }
    }
  }

要索引的示例文档

PUT test_index/your_mapping/1
{
  "name" : "south africa"
}

那么你可以查询名称字段的所有变体通配符表达式 https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_multi_field.

GET test_index/your_mapping/_search
{
  "query": {
    "query_string": {
      "fields": [
        "name*"
      ],
      "query": "safrica"
    }
  }
}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Elasticsearch 使用带有同义词的 shingle 过滤器 的相关文章

随机推荐