[How To] Remove deleted records from Main Index in Sphinx

2023-05-16

http://www.sanisoft.com/blog/2013/06/28/remove-deleted-records-in-sphinx/


The problem

You probably landed here searching for “delete records sphinx”. You are indeed at the right place.

In an earlier post, Index merging in Sphinx we saw how index merging is useful in sphinx. While merging we keep main index as archive, index only new records using delta index and merge it to main.

But what if one deletes an old record from the database?

Lets take an example

Assume that we have 10 records in the main index from doc_id 1 to 10, and we add 2 new records say 11, 12. After index merging, we will have main index with 12 records (doc_id: 1 to 12).

But what if we have also deleted record 2, 4 from the database? Index merging will not remove the garbage data from main index, and hence when we will search associated keywords sphinx will return these ids too.

The solution

Solution for the above issue is ‘sql_query_killlist‘. It is a data source configuration option which accepts a column of doc_ids we need to suppress in the main index.

How to implement this thing?

Create a table in your database ‘deleted_posts’

1
2
3
4
5
CREATE TABLE `deleted_posts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `post_id` int(11),
   PRIMARY KEY (`id`)
);

Now whenever we delete a post from database its id must be inserted in the above table

In your delta source add options

1
sql_query_killlist = SELECT post_id FROM deleted_posts

Ok, so that is it

Now unnecessary records will remove themselves from main index whenever we use the index merging technique :)

Sample conf file

This is how your configuration file will look after above substitutions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
source main
{
   # Connection settings
   type = mysql
   sql_host = localhost
   sql_user = root
   sql_pass =
   sql_db = blog
   sql_query_pre = SET NAMES utf8
 
   # pre query to update the counter with max id from posts table
   sql_query_pre = REPLACE INTO posts_counter SELECT 1 , MAX ( id ) FROM posts
 
   # query to fetch the records for indexing.
   # We are fetching only those records whose id is less than the max id from the counter
   sql_query = SELECT id , title , description FROM posts \
   WHERE id <= ( SELECT max_id FROM posts_counter WHERE id = 1 ) }
 
# Define delta source. We are extending delta from main
source source delta : main {
   sql_query_pre = SET NAMES utf8
   # fetch only those records whose id is greater than max_id i.e. those records which are not   already indexed.
   sql_query = SELECT id , title , description FROM posts \
   WHERE id > ( SELECT max_id FROM posts_counter WHERE id = 1 )
 
   # parameter containing doc_ids to be removed from main index
   sql_query_killlist = SELECT post_id FROM deleted_posts
 
   #optional - used to empty table after record is removed from main index
   sql_query_post_index = DELETE FROM deleted_posts
}
 
index main
{
   source = main
   path = / path / to / mainindex
   charset_type = utf8
}
 
index delta : main
{
   source = delta
   path = / path / to / deltaindex
}
 
searchd
{
   # Port to listen on
   port = 3312
 
   # Next few are the paths to log files
   log = / usr / local / sphinx / var / log / searchd .log
   query_log = / usr / local / sphinx / var / log / query .log
 
   # Maximum amount of concurrent searches to run - 0 for unlimited
   max_children = 30
 
   # Path to pid file
   pid_file = / usr / local / sphinx / var / log / searchd .pid
}

Share this:


本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

[How To] Remove deleted records from Main Index in Sphinx 的相关文章

随机推荐