http://www.sanisoft.com/blog/2013/06/28/remove-deleted-records-in-sphinx/
The problem
You probably landed here searching for “delete records sphinx”. You are indeed at the right place.
In an earlier post, Index merging in Sphinx we saw how index merging is useful in sphinx. While merging we keep main index as archive, index only new records using delta index and merge it to main.
But what if one deletes an old record from the database?
Lets take an example
Assume that we have 10 records in the main index from doc_id 1 to 10, and we add 2 new records say 11, 12. After index merging, we will have main index with 12 records (doc_id: 1 to 12).
But what if we have also deleted record 2, 4 from the database? Index merging will not remove the garbage data from main index, and hence when we will search associated keywords sphinx will return these ids too.
The solution
Solution for the above issue is ‘sql_query_killlist‘. It is a data source configuration option which accepts a column of doc_ids we need to suppress in the main index.
How to implement this thing?
Create a table in your database ‘deleted_posts’
|
CREATE
TABLE
`deleted_posts`
(
`id`
int(11)
NOT NULL
AUTO_INCREMENT,
`post_id`
int(11),
PRIMARY
KEY
(`id`)
);
|
Now whenever we delete a post from database its id must be inserted in the above table
In your delta source add options
|
sql_query_killlist
=
SELECT
post_id
FROM
deleted_posts
|
Ok, so that is it
Now unnecessary records will remove themselves from main index whenever we use the index merging technique
Sample conf file
This is how your configuration file will look after above substitutions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
source
main
{
# Connection settings
type
=
mysql
sql_host
=
localhost
sql_user
=
root
sql_pass
=
sql_db
=
blog
sql_query_pre
=
SET
NAMES
utf8
# pre query to update the counter with max id from posts table
sql_query_pre
=
REPLACE
INTO
posts_counter
SELECT
1
,
MAX
(
id
)
FROM
posts
# query to fetch the records for indexing.
# We are fetching only those records whose id is less than the max id from the counter
sql_query
=
SELECT
id
,
title
,
description
FROM
posts
\
WHERE
id
<=
(
SELECT
max_id
FROM
posts_counter
WHERE
id
=
1
)
}
# Define delta source. We are extending delta from main
source
source
delta
:
main
{
sql_query_pre
=
SET
NAMES
utf8
# fetch only those records whose id is greater than max_id i.e. those records which are not already indexed.
sql_query
=
SELECT
id
,
title
,
description
FROM
posts
\
WHERE
id
>
(
SELECT
max_id
FROM
posts_counter
WHERE
id
=
1
)
# parameter containing doc_ids to be removed from main index
sql_query_killlist
=
SELECT
post_id
FROM
deleted_posts
#optional - used to empty table after record is removed from main index
sql_query_post_index
=
DELETE
FROM
deleted_posts
}
index
main
{
source
=
main
path
=
/
path
/
to
/
mainindex
charset_type
=
utf8
}
index
delta
:
main
{
source
=
delta
path
=
/
path
/
to
/
deltaindex
}
searchd
{
# Port to listen on
port
=
3312
# Next few are the paths to log files
log
=
/
usr
/
local
/
sphinx
/
var
/
log
/
searchd
.log
query_log
=
/
usr
/
local
/
sphinx
/
var
/
log
/
query
.log
# Maximum amount of concurrent searches to run - 0 for unlimited
max_children
=
30
# Path to pid file
pid_file
=
/
usr
/
local
/
sphinx
/
var
/
log
/
searchd
.pid
}
|
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)