我建议您进行以下两项更改。第一个与您提出的问题直接相关,第二个是建议。
不要使用多个同义词的扩展,而是执行相反的操作,即所有同义词都指向单个单词同义词。所以,改变"suco => suco, refresco, bebida de soja"
to "suco, refresco, bebida de soja => suco"
更改过滤器的顺序synonyms
分析仪。地方lowercase
before synonym_br
。这将确保案件不会影响synonym_br
令牌过滤器。
所以最终设置将是:
{
"settings": {
"analysis": {
"filter": {
"synonym_br": {
"type": "synonym",
"synonyms": [
"suco, refresco, bebida de soja => suco"
]
},
"brazilian_stop": {
"type": "stop",
"stopwords": "_brazilian_"
}
},
"analyzer": {
"synonyms": {
"filter": [
"lowercase",
"synonym_br",
"brazilian_stop",
"asciifolding"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}
这是如何运作的?
用于输入bebida de soja
过滤器按以下顺序应用:
Input Filter Result tokens
====================================
lowercase bebida, de, soja
synonym_br suco <------- all the above tokens(including position) exactly matches a synonym
brazilian_stop suco
asciifolding suco
让我们来看看brazilian_stop
在行动中。为此,我们需要一个与同义词不匹配但具有的输入de
在里面。例如。de soja
:
Input Filter Result tokens
=================================
lowercase de, soja
synonym_br de, soja <------- none of the tokens (independently or combined(including position)) matches any synonym
brazilian_stop soja <------- de is removed as it is a stopword
asciifolding soja