您可以使用语音分析插件 https://github.com/elastic/elasticsearch-analysis-phonetic为了那个任务。
让我们使用该插件的自定义分析器创建一个索引:
curl -XPUT localhost:9200/phonetic -d '{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
}'
现在让我们使用新的分析器来分析您的示例。正如你所看到的,两者plain
and plane
将产生单个令牌PLN
:
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'plane'
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'plain'
{
"tokens" : [ {
"token" : "PLN",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
同样的事情mail
and male
产生单个令牌ML
:
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'mail'
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'male'
{
"tokens" : [ {
"token" : "ML",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
我用过metaphone
编码器,但您可以自由使用任何其他支持的编码器。您可以找到有关所有支持的编码器的更多信息:
- in the Apache 编解码器文档 https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/package-summary.html for
metaphone
, double_metaphone
, soundex
, caverphone
, caverphone1
, caverphone2
, refined_soundex
, cologne
, beider_morse
- in the 附加编码器 https://github.com/elastic/elasticsearch-analysis-phonetic/tree/master/src/main/java/org/elasticsearch/index/analysis/phonetic for
koelnerphonetik
, haasephonetik
and nysiis