假设我有一个列族:
CREATE TABLE update_audit (
scopeid bigint,
formid bigint,
time timestamp,
record_link_id bigint,
ipaddress text,
user_zuid bigint,
value text,
PRIMARY KEY ((scopeid, formid), time)
) WITH CLUSTERING ORDER BY (time DESC)
有两个二级索引,其中record_link_id
是一个高基数列:
CREATE INDEX update_audit_id_idx ON update_audit (record_link_id);
CREATE INDEX update_audit_user_zuid_idx ON update_audit (user_zuid);
据我所知,Cassandra 将创建两个隐藏的列族,如下所示:
CREATE TABLE update_audit_id_idx(
record_link_id bigint,
scopeid bigint,
formid bigint,
time timestamp
PRIMARY KEY ((record_link_id), scopeid, formid, time)
);
CREATE TABLE update_audit_user_zuid_idx(
user_zuid bigint,
scopeid bigint,
formid bigint,
time timestamp
PRIMARY KEY ((user_zuid), scopeid, formid, time)
);
Cassandra 二级索引作为本地索引实现,而不是像普通表那样分布。每个节点仅存储其所存储数据的索引。
考虑以下查询:
select * from update_audit where scopeid=35 and formid=78005 and record_link_id=9897;
- 该查询将如何在 Cassandra 中“幕后”执行?
- 高基数列索引 (
record_link_id
)影响其性能?
- Cassandra 会触及上述查询的所有节点吗?Why?
- 首先执行哪个条件,基表partition_key还是二级索引partition_key? Cassandra 将如何将这两个结果相交?