Hibernate 搜索查找短语的部分匹配

2024-02-21

在我的项目中,我们使用带有 lucene 分析器和 Solar 的 Hibernate Search 4.5。 我向我的客户提供一个文本字段。当他们输入一个短语时,我想找到所有User名称包含给定短语的实体。

例如,考虑数据库中具有以下标题的条目列表:

[ Alan Smith, John Cane, Juno Taylor, Tom Caner Junior ]

jun应该返回Juno Taylor and Tom Caner Junior

an应该返回Alan Smith, John Cane and Tom Caner Junior

    @AnalyzerDef(name = "customanalyzer", tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class), filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") })

    })
@Analyzer(definition = "customanalyzer")
    public class Student implements Serializable {

        @Column(name = "Fname")
        @Field(index = Index.YES, store = Store.YES, analyze = Analyze.YES)
        private String fname;

        @Column(name = "Lname")
        @Field(index = Index.YES, store = Store.YES, analyze = Analyze.YES)
        private String lname;

    }

我尝试过使用通配符搜索,但是

通配符查询不会对匹配项应用分析器。否则有 * 或 ? 的风险被毁掉太高了。 https://docs.jboss.org/hibernate/search/4.5/reference/en-US/html_single/#section-creating-faceting-request

Query luceneQuery = mythQB
    .keyword()
      .wildcard()
    .onFields("fname")
    .matching("ju*")
    .createQuery();

我怎样才能实现这个目标?


首先,您没有将分析器分配给您的字段,因此当前未使用它。您应该使用@Field.analyzer。

其次,为了回答你的问题,最好用以下方法来分析此类文本:EdgeNGramFilter。您应该将此过滤器添加到分析器定义中。

EDIT:此外,为了防止诸如“sathya”之类的查询与“sanchana”匹配,您应该在查询时使用不同的分析器。

下面是一个完整的例子。

@AnalyzerDef(name = "customanalyzer", tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class), filters = {
        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
        @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") })
        @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = { @Parameter(name = "maxGramSize", value = "15") })

})
@AnalyzerDef(name = "customanalyzer_query", tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class), filters = {
        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
        @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") })

})
public class Student implements Serializable {

    @Column(name = "Fname")
    @Field(index = Index.YES, store = Store.YES, analyze = Analyze.YES, analyzer = @Analyzer(definition = "customanalyzer"))
    private String fname;

    @Column(name = "Lname")
    @Field(index = Index.YES, store = Store.YES, analyze = Analyze.YES, analyzer = @Analyzer(definition = "customanalyzer")))
    private String lname;

}

然后特别提到您希望在构建查询时使用这个“查询”分析器:

QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Student.class)
    // Here come the assignments of "query" analyzers
    .overridesForField( "fname", "customanalyzer_query" )
    .overridesForField( "lname", "customanalyzer_query" )
    .get();
// Then it's business as usual
Query luceneQuery = queryBuilder.keyword().onFields("fname", "lname").matching("sathya").createQuery();
FullTextQuery query = fullTextEntityManager.createFullTextQuery(luceneQuery, Student.class);

也可以看看:https://stackoverflow.com/a/43047342/6692043 https://stackoverflow.com/a/43047342/6692043


顺便说一句,如果您的数据仅包含名字和姓氏,则不应使用词干提取(SnowballPorterFilterFactory):它只会无缘无故地降低搜索的准确性。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Hibernate 搜索查找短语的部分匹配 的相关文章

随机推荐