您使用 ngram 有什么具体原因吗? Elasticsearch 在“查询”以及您索引的文本上使用相同的分析器 - 除非明确指定 search_analyzer,正如 @Adam 在他的回答中提到的。在你的情况下,使用可能就足够了标准分词器 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-tokenizer.html带小写过滤器
我使用以下设置和映射创建了一个索引:
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"typehere": {
"properties": {
"name": {
"type": "string",
"analyzer": "custom_analyzer"
},
"description": {
"type": "string",
"analyzer": "custom_analyzer"
}
}
}
}
}
索引两个文档
文件1
PUT /test_index/test_mapping/1
{
"name" : "Sara Connor",
"Description" : "My real name is Sarah Connor."
}
Doc 2
PUT /test_index/test_mapping/2
{
"name" : "John Connor",
"Description" : "I might save humanity someday."
}
做一个简单的搜索
POST /test_index/_search?query=sara
{
"query" : {
"match" : {
"name" : "SARA"
}
}
}
并只取回第一个文档。我也尝试过“sara”和“Sara”,结果相同。
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19178301,
"hits": [
{
"_index": "test_index",
"_type": "test_mapping",
"_id": "1",
"_score": 0.19178301,
"_source": {
"name": "Sara Connor",
"Description": "My real name is Sarah Connor."
}
}
]
}
}