MySQL 布尔全文搜索中的“~”(波形符)运算符的行为与 MySQL 开发人员网站中所述不同

2023-12-27

我创建了下表fruits -

CREATE TABLE `fruits` (
  `id` tinyint unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `ft_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

然后我在表中输入以下值fruits -

SELECT * FROM fruits;
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  3 | mango, kiwi   |
|  4 | mango, guava  |
|  5 | apple, banana |
+----+---------------+

现在我运行以下三个 SQL 查询 -

Query 1:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

Query 2:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple -orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

Query 3:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple ~orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

根据 MySQL 开发者网站,以下是以下功能~“布尔全文搜索”中的(波形符)运算符

https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

  • '+苹果 ~macintosh'

查找包含单词“apple”的行,但如果该行还包含单词“macintosh”,则将其评分低于如果行不包含单词的情况。这比搜索“+apple -macintosh”“更柔和”,“+apple -macintosh”的存在导致根本不返回该行。

我已经尝试过~(波形符)运算符出现在“查询 3”中,但输出肯定不是预期的结果。在这里,预期的行为是 row withid = 1终于来了。

附: 我正在使用 MySQL 版本 - 8.0.26-0ubuntu0.20.04.2 for Linux on x86_64 ((Ubuntu))


尽管没有任何关于我的答案的文档,但经过彻底的实验,我得出了这个最合乎逻辑的结论 -

“+”运算符的存在会使“~”运算符的任何效果无效

我已经更新了我的表格fruits具有以下值 -

SELECT * FROM fruits;
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  4 | mango kiwi pomegranate  |
|  5 | mango guava watermelon  |
|  6 | apple banana kiwi       |
+----+-------------------------+

Query 1:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('apple mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  1 | apple orange watermelon |
|  5 | mango guava watermelon  |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+

Query 2:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('apple ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
+----+-------------------------+

Query 3:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  5 | mango guava watermelon  |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+

这里,在查询 1、2 和 3 中,值前面没有运算符apple and mango and ~运算符位于值之前pomegranate。这确保了具有单词的行pomegranate排名低于其他人。

Query 4:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple +mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
+----+-------------------------+

Query 5:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
+----+-------------------------+

Query 6:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  4 | mango kiwi pomegranate  |
|  5 | mango guava watermelon  |
+----+-------------------------+

这里,在查询 4、5 和 6 中+运算符位于值之前apple and mango and ~运算符位于值之前pomegranate。显然存在+运算符使任何效果无效~操作员。

Query 7:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple +mango <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  2 | apple mango pomegranate |
+----+-------------------------+

Query 8:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
+----+-------------------------+

Query 9:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+mango <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  5 | mango guava watermelon  |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+

这里,在查询 7、8 和 9 中+运算符位于值之前apple and mango and <运算符位于值之前pomegranate。这确保了具有单词的行pomegranate排名低于其他人。

因此,从这里可以推断出——if +操作员存在,使用<运算符而不是~操作员


UPDATE

经过大量计算,我创建了表格fruits_score_count这表明score每个fruit当完成布尔全文搜索时。

SELECT * FROM fruits_score_count;
+----+-------------+---------------------+----------------------+
| id | fruit_name  | row_numbers_matched | score                |
+----+-------------+---------------------+----------------------+
|  1 | apple       |                   4 | 0.031008131802082062 |
|  2 | banana      |                   2 |  0.22764469683170319 |
|  3 | guava       |                   1 |   0.6055193543434143 |
|  4 | kiwi        |                   2 |  0.22764469683170319 |
|  5 | mango       |                   4 | 0.031008131802082062 |
|  6 | orange      |                   1 |   0.6055193543434143 |
|  7 | pomegranate |                   2 |  0.22764469683170319 |
|  8 | watermelon  |                   2 |  0.22764469683170319 |
+----+-------------+---------------------+----------------------+

Query 1:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('apple mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  1 | apple orange watermelon | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

Query 2:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
+----+-------------------------+----------------------+

Query 3:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

Query 4:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple +mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  2 | apple mango pomegranate | 0.062016263604164124 |
|  3 | apple mango banana      | 0.062016263604164124 |
+----+-------------------------+----------------------+

Query 5:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  2 | apple mango pomegranate | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
+----+-------------------------+----------------------+

Query 6:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  2 | apple mango pomegranate | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  4 | mango kiwi pomegranate  | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
+----+-------------------------+----------------------+

Query 7:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple +mango <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
+----+-------------------------+----------------------+

Query 8:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
+----+-------------------------+----------------------+

Query 9:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

此处,查询 1、查询 2、查询 3、查询 7、查询 8、查询 9 的行为符合预期。

但从查询 4、查询 5、查询 6 可以清楚地看出 -

在......的存在下+运算符在值前面加上~运算符基本上使值不可见。

仔细观察还发现——

x ~y and +x <y是等价的


进一步的实验

Query 1:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  4 | mango kiwi pomegranate  | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
+----+-------------------------+----------------------+
  • 第 1 行有id = 3得到最大分数,它是分数的总和mango and apple.
  • 第 2 行与id = 4获得第二个最大分数,即mango。的存在+前面的运算符mango makes ~pomegranate搜索短语不相关。
  • 第 3 行与id = 5与第 2 行的分数相同。但它的位置低于第 2 行,因为当分数相同时,行按升序排列primary key, here id is primary key.
  • 第 4 行与id = 2得分最低,因此排在最后。自从这个词出现以来apple存在并且在搜索短语中没有+前面的运算符apple, hence ~pomegranate考虑到搜索短语,这会显着降低分数。

Query 2:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango apple <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

这再次说明了<即使存在,运算符也会生效+操作员。

这进一步强化了我之前的观察——

if +操作员存在,使用<运算符而不是~操作员

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

MySQL 布尔全文搜索中的“~”(波形符)运算符的行为与 MySQL 开发人员网站中所述不同 的相关文章

随机推荐