SOLR 在第一个方面查询时很慢，但在以后的查询中相当快

2023-12-07

我试图找出为什么我的 SOLR (4.1 ) 实例对于方面查询非常慢。索引大约有200M文档，服务器有64GB RAM。

我的查询如下所示：

q=CampaignId:1462%0ASourceDateUtc:[2014-01-01T00:00:00.000Z TO 2014-01-30T00:00:00.000Z]
&wt=xml&indent=true&rows=0
&facet=true&facet.field=UserName&facet.mincount=10&facet.method=fc

第一次命中大约需要 6 分钟，但是当结果返回时，我使用相同的查询再次搜索或稍微更改 SourceDateUtc 中的范围，它运行得相当快。

这是我的 solrconfig.xml （查询部分）

<query>
  <!-- Cache used by SolrIndexSearcher for filters (DocSets),
         unordered sets of *all* documents that match a query.
         When a new searcher is opened, its caches may be prepopulated
         or "autowarmed" using data from caches in the old searcher.
         autowarmCount is the number of items to prepopulate.  For LRUCache,
         the autowarmed items will be the most recently accessed items.
       Parameters:
         class - the SolrCache implementation (currently only LRUCache)
         size - the maximum number of entries in the cache
         initialSize - the initial capacity (number of entries) of
           the cache.  (seel java.util.HashMap)
         autowarmCount - the number of entries to prepopulate from
           and old cache.

    <filterCache
      class="solr.LRUCache"
      size="1024"
      initialSize="512"
      autowarmCount="0"/>-->

   <!-- queryResultCache caches results of searches - ordered lists of
         document ids (DocList) based on a query, a sort, and the range
         of documents requested.  -->
    <queryResultCache
      class="solr.LRUCache"
      size="10000"
      initialSize="512"
      autowarmCount="0"/>

  <!-- documentCache caches Lucene Document objects (the stored fields for each document).
       Since Lucene internal document ids are transient, this cache will not be autowarmed.  -->
    <documentCache
      class="solr.LRUCache"
      size="1024"
      initialSize="512"
      autowarmCount="0"/>

    <!-- Example of a generic cache.  These caches may be accessed by name
         through SolrIndexSearcher.getCache().cacheLookup(), and cacheInsert().
         The purpose is to enable easy caching of user/application level data.
         The regenerator argument should be specified as an implementation
         of solr.search.CacheRegenerator if autowarming is desired.  -->
    <!--
    <cache name="myUserCache"
      class="solr.LRUCache"
      size="4096"
      initialSize="1024"
      autowarmCount="1024"
      regenerator="org.mycompany.mypackage.MyRegenerator"
      />
    -->

    <!-- An optimization that attempts to use a filter to satisfy a search.
         If the requested sort does not include a score, then the filterCache
         will be checked for a filter matching the query.  If found, the filter
         will be used as the source of document ids, and then the sort will be
         applied to that.
      -->
    <useFilterForSortedQuery>true</useFilterForSortedQuery>

    <!-- An optimization for use with the queryResultCache.  When a search
         is requested, a superset of the requested number of document ids
         are collected.  For example, of a search for a particular query
         requests matching documents 10 through 19, and queryWindowSize is 50,
         then documents 0 through 50 will be collected and cached. Any further
         requests in that range can be satisfied via the cache.
    -->
    <queryResultWindowSize>100</queryResultWindowSize>

    <!-- This entry enables an int hash representation for filters (DocSets)
         when the number of items in the set is less than maxSize. For smaller
         sets, this representation is more memory efficient, more efficient to
         iterate over, and faster to take intersections.
     -->
    <HashDocSet maxSize="3000" loadFactor="0.75"/>


    <!-- boolToFilterOptimizer converts boolean clauses with zero boost
         cached filters if the number of docs selected by the clause exceeds the
         threshold (represented as a fraction of the total index)
    -->
    <boolTofilterOptimizer enabled="true" cacheSize="32" threshold=".05"/>

    <!-- Lazy field loading will attempt to read only parts of documents on disk that are
         requested.  Enabling should be faster if you aren't retrieving all stored fields.
    -->
    <enableLazyFieldLoading>false</enableLazyFieldLoading>

    <!-- Use Cold Searcher

         If a search request comes in and there is no current
         registered searcher, then immediately register the still
         warming searcher and use it.  If "false" then all requests
         will block until the first searcher is done warming.
    -->
    <useColdSearcher>true</useColdSearcher>

</query>

我也尝试启用filterCache，但没有帮助。

Thanks.

可能是热身问题。预热字段缓存(facet.method=fc)对于solr有效工作非常重要。如果您尚未配置预热查询，请考虑将方面查询（如示例中所示）添加到 solrconfig.xml 中的 newsearcher 和 firstsearcher 部分。

http://wiki.apache.org/solr/SolrConfigXml#A.22Query.22_Related_Event_Listeners

<listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">*:*</str>
              <str name="start">0</str>
              <str name="rows">10</str>
              <str name="facet">true</str>
              <str name="facet.field">UserName</str>
              <str name="facet.mincount">10</str>
              <str name="facet.method">fc</str>
        </lst>
      </arr>
</listener>

您可能还想关闭 useColdSearher

<useColdSearcher>true</useColdSearcher>

进一步阅读：

Solr 中良好的自动预热查询的要素是什么？它们是如何工作的？

http://wiki.apache.org/solr/SolrFacetingOverview

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Solr

facet

SOLR 在第一个方面查询时很慢，但在以后的查询中相当快的相关文章

使用多个字段对 solr 搜索结果进行排序 (solrj)

我需要根据两个因素对从 apache solr 返回的结果进行排序我们的系统中有三个实体由 solr 索引组项目和数据集在结果中我希望首先显示数据集然后是项目然后是组但我仍然希望它尊重每种类型的评分值因此例如结果将是得
SLES Apache Solr start.jar，无法访问 jarfile

我在启动 Apache Solr 搜索时遇到一些问题在我的 SLES 11 64 位服务器上安装 java 7 后我将 solr 3 6 1 解压到 srv apache solr 3 6 0 之后我想启动该软件但是当我尝试时 jav
本地/离线网站“站点”的全文搜索[重复]

这个问题在这里已经有答案了可能的重复通过 javascript 对 CD Rom 上的静态 HTML 文件进行全文搜索 https stackoverflow com questions 1357173 full text search
DataImportHandler 未在 solr admin 中索引 mysql 表

我正在尝试使用 DataImportHandler 在 solr 中索引 mysql 表但它似乎没有索引数据配置 xml
ckan本地安装，solr JSP支持未配置500错误

我正在尝试使用 Ubuntu 14 04 LTS 在本地计算机上安装 CKAN 我按照从找到的源安装的说明进行操作here http docs ckan org en latest maintaining installing instal
使用 sunspot/solr 搜索多个模型

我已经能够成功地实现基本的全文搜索但是当我尝试使用范围 with statements 时任何涉及多对多关系模型的查询似乎都不适合我我知道相关行位于数据库中因为我的 sql 语句确实返回了数据然而太阳黑子查询不会返回任何结果我
使用facet_grid从ggplot中提取单个图

我想使用 ggplot 和生成一些图facet grid并将绘图保存为对象我的问题是我还想将每个子组即每个方面单独保存为一个对象我现在的问题是你是否可以从中提取一个方面facet grid并将其保存为对象这是一些简单的代码 lib
Solr 您的意思是（拼写检查组件）

我在我的应用程序中使用 solr 并集成了拼写检查组件但我遇到了一些问题第一的当我输入一个用空格分隔的术语时他们会给我每个术语的更正 Eg 水 gt 什么术语但事实是watters 第二当我输入一些带有错误术语的短语时尽管其他
在 solr 8 中的 fl 中使用父过滤器时获取“当架构嵌套时不应发送父过滤器”

我正在尝试使用子文档获取父文档但得到当模式嵌套时不应发送父过滤器 error 附上下面我尝试过但无法得到解决方案的查询 q parent which content type person fl child parentFilter c
solr + haystack + django 我在哪里放置 schema.xml？

我刚刚安装Solr and Haystack for a Django我正在做的项目下列的this http docs haystacksearch org dev tutorial html Haystack教程我创建了一个 sche
如何将 Solarium 配置为使用 POST 而不是 GET 请求

我面临的问题是我们发送到 solr jetty 的 uri 变得很长超过 9k 字节超出了 jetty 的默认限制解决方案是从 GET 请求切换到 POST 请求因为我们不想增加 jetty 可以接受的 requestHeaderS
Solr 在 TrieDateField 上按年份过滤

我的 Solr 模式有一个字段类型tdate班级的solr TrieDateField
从 Solr Admin 删除 solr 文档

如何使用 SOLR Admin 删除 SOLR 索引中的所有文档我尝试使用该网址并且它有效但想知道是否可以使用管理员来完成相同的操作使用下面的查询之一DocumentSolr 管理 UI 选项卡 XML
ggplot2 中列组合的分面图

我正在做相关性的组合并且想在 ggplot2 中绘制每个组合然而我希望每个组合都在一个单独的面板上而不是一个面板上的所有点 making up columns in my real data I m doing correlatio
使用查询时间排名的复杂“特色产品”模型（可选过滤器）

我有一个products显示过滤结果的索引category pages 对于给定的category 任意数量的products可能会被标记为featured 表示它首先显示 When products显示为category 只有一个特色产品
SOLR - Boost 函数 (bf) 以增加日期最接近 NOW 的文档的分数

我有一个 solr 实例其中包含具有 startTime 字段的文档范围从上个月到一年后我想添加一个提升查询函数来提升 startTime 字段接近当前时间的文档的分数到目前为止我已经看到很多使用 rord 为较新的文档添加增强
Solr 错误 - 流主体被禁用

我正在从浏览器 URL 中删除文档我正在使用 Solr 7 4 0 我正在使用此查询来删除文档 http localhost 8983 solr test update stream body
ggplot2 方面的内部排序

我正在尝试在 ggplot2 中绘制一个方面但我很难使不同方面的内部顺序正确数据如下 head THAT EXT ID FILE GENRE NODE 1 CKC 1823 01 CKC Novels better 2 CKC 1824
Solr 分面搜索性能建议

我们有一个包含 86 315 770 个文档的 solr 实例它使用高达 4GB 的内存我们需要它来对称为内容的标记化字段进行分面磁盘上的索引大小为 23GB 为什么我们要在标记化字段上进行分面因为我们想要查询该字段中最常用的 n
有没有推荐的与 Lucene 或 Solr 一起使用的爬虫工具？ [关闭]

就目前情况而言这个问题不太适合我们的问答形式我们希望答案得到事实参考资料或专业知识的支持但这个问题可能会引发辩论争论民意调查或扩展讨论如果您觉得这个问题可以改进并可能重新开放访问帮助中心 help reopen questi

随机推荐

节点版本不会使用 n 更新

为了提供一些背景信息我使用这篇 Medium 帖子设置了我的机器不要使用sudo with npm still 我使用安装节点brew大约一年半前安装了 v12 18 1 我也安装了n当时使用brew 但直到现在才需要更改版本 My
R 控制台是德语，如何将 R 设置为英语？ [复制]

这个问题在这里已经有答案了我在 Windows 8 上使用 R 如果我在 RStudio 中运行 R 控制台会以德语显示错误消息由于多种原因这是不可取的我希望 R 是英语我在这里发现了同样的问题 R 控制台是我的母语如何将 R
用于初始化的日期格式

在 VBScript 中我需要用日期初始化变量在我的国家我们通常使用以下格式指定日期日月年这是我试图在 VBScript 中执行的操作 Date 07 06 1973 MsgBox FormatDateTime Date vbLo
两个字段注释的乘积

目前我的 Django 应用程序中有一行如下所示 db execute SELECT SUM price qty FROM inventory orderline WHERE order id s self id 我宁愿通过 Django
Windows Phone 8.1 中 UIElement 拖放延迟

我正在使用ManipulationDelta事件处理程序将一个简单的椭圆拖放到屏幕上的画布中我在几个地方使用在线发布的标准方法以下是我的事件处理程序中的代码 Ellipse dragableItem sender as Ellipse
如何在C#中从x509证书获取数字签名值

有谁知道如何使用 C 从 x509 证书获取数字签名值实际上位于 x509Store 中而不是从文件中验证并在文本框中显示它我知道 GetRawCertDataString 返回整个 x509 证书的原始数据其中包括最后一行的数字
使用 UIWebView 上传照片

我想直接从 UIWebView 上传照片该视图具有用于文件上传的 HTML 表单当我单击按钮时我会看到拍照或选择现有照片的选项但是当我单击这两个选项中的任何一个时我得到警告尝试呈现
在Python中定义名称

我很困惑为什么我的程序不起作用我应该使用嵌套的 if 来询问人们的姓名和头衔医生女性男性然后打印出女士姓名先生姓名或医生姓名如果有人知道我遇到的问题请回复 name input Enter your name title i
从不寻常的 svn 目录结构迁移到 maven？

与正常 svn 目录结构相反我使用以下结构 trunk project1 project2 project3 branches project1 branch project1 project2 project2 branch proj
DatePicker 无法在 ajax 加载的页面中工作

我将日期选择器附加到全局脚本文件中的输入如下所示 document on focusin datePick function this datepicker dateFormat dd mm yy changeMonth true cha
查询网格内的点 Maya python api

我试图找出一种计算世界空间点是否位于任意网格内部的方法如果它不是立方体或球体我不太确定如何计算它的数学任何帮助都会很棒人们可以使用一种简单的光线追踪技巧来测试您是在形状的内部还是外部事实证明 2D 3D 对象甚至可能更高维度的对象
表 [表名] 未锁定

我正在编写一个锁定表的 MySQL 查询 LOCK TABLE table 1 WRITE 之后我正在执行一些函数并且在其中一个函数中我正在另一个未锁定的表上执行另一个查询 SELECT FROM completely differe
可以将其作为 SQL Server 作业吗？

我有一个文件夹其中包含用于使用徽章的图像我有另一个文件夹其中包含图像的重命名版本该文件夹位于另一台计算机上我需要创建一个过程来复制和重命名找到的任何新图像名称之间的映射位于 SQL Server DB 中将其创建为 SQL S
使用 CMake 检测 Qt5

我正在尝试在 Ubuntu 上安装和使用 Qt 5 为我的需要 Qt 5 的项目运行 CMake 会导致 The C compiler identification is GNU 4 8 4 The CXX compiler identif
确保 epmd 已启动

我有一个 eunit 测试它生成唯一的节点名称并开始分发 A B C now Nodename list to atom lists flatten io lib format test b b b localhost A B C ok
在 Macos 上安装 xampp 的 mongodb 扩展

我找了一晚上也没找到解决办法我尝试通过运行来安装 mongodb 扩展sudo Applications XAMPP xamppfiles bin pecl install mongodb但最后我得到这个错误 fatal error u
选择表数据到数组 - 仅获取一行

我正在尝试从表中获取数据EmailList并将其放入一个数组中该数组将被传递到 Outlook 电子邮件的收件人字段电子邮件的脚本已创建我计划使用Join 函数将数组组合成字符串如下所示 Join varEmailList My
如何修复 Python 中的 unicode/cPickle 错误？

ids cPickle loads gem value loads argument 1 must be string not unicode cPickle loads想要一个字节字符串这正是cPickle dumps输出然后你给它提
MongoDB：如何查询 json 字符串？

有一个 MongoDB 集合其中填充了如下文档 id ObjectId 5b7f83b591fae49715443590 content n t email email protected n t country code US n 如你
SOLR 在第一个方面查询时很慢，但在以后的查询中相当快

我试图找出为什么我的 SOLR 4 1 实例对于方面查询非常慢索引大约有200M文档服务器有64GB RAM 我的查询如下所示 q CampaignId 1462 0ASourceDateUtc 2014 01 01T00 00 00

SOLR 在第一个方面查询时很慢，但在以后的查询中相当快

SOLR 在第一个方面查询时很慢，但在以后的查询中相当快 的相关文章

随机推荐

热门标签

SOLR 在第一个方面查询时很慢，但在以后的查询中相当快的相关文章