没有魔法。如果某些行已连接而某些行未连接,则必须表示此信息somehow,而“关系”方式是“连接”(又名“链接”)表。是的,连接表可能会变得很大,但幸运的是,数据库非常有能力处理大量数据。
使用联结表与逗号分隔列表(或类似列表)相比有充分的理由,包括:
设计连接表时,请提出以下问题:
- Do I need to query in only one direction or both?1
- If 一个方向,只需在两个外键上创建一个复合主键(我们称它们为 PARENT_ID 和 CHILD_ID)。顺序很重要:如果从父级到子级查询,PK 应该是:{PARENT_ID, CHILD_ID}。
- If 两个方向,还以相反的顺序创建一个复合索引,在本例中为 {CHILD_ID, PARENT_ID}。
- Is the "extra" data small?
- If yes, cluster http://use-the-index-luke.com/sql/clustering/index-organized-clustered-index the table and cover http://use-the-index-luke.com/sql/glossary/covering-index the extra data in the secondary index as necessary.2
- I no, don't cluster the table and don't cover the extra data in the secondary index.3
- Are there any additional tables for which the junction table acts as a parent?
- If yes,考虑添加代理键是否值得让子 FK 保持苗条。但请注意,如果添加代理键,这可能会消除聚类的机会。
在许多情况下,这些问题的答案是:两者,是和否,在这种情况下,您的表将类似于此(Oracle 语法如下):
CREATE TABLE JUNCTION_TABLE (
PARENT_ID INT,
CHILD_ID INT,
EXTRA_DATA VARCHAR2(50),
PRIMARY KEY (PARENT_ID, CHILD_ID),
FOREIGN KEY (PARENT_ID) REFERENCES PARENT_TABLE (PARENT_ID),
FOREIGN KEY (CHILD_ID) REFERENCES CHILD_TABLE (CHILD_ID)
) ORGANIZATION INDEX COMPRESS;
CREATE UNIQUE INDEX JUNCTION_TABLE_IE1 ON
JUNCTION_TABLE (CHILD_ID, PARENT_ID, EXTRA_DATA) COMPRESS;
注意事项:
-
ORGANIZATION INDEX
:大多数 DBMS 称之为集群的 Oracle 特定语法。其他 DBMS 有自己的语法,有些(MySQL/InnoDB)暗示集群,用户无法将其关闭。
-
COMPRESS
:一些 DBMS 支持领先的索引压缩 http://richardfoote.wordpress.com/2008/02/17/index-compression-part-i-low/。由于聚集表本质上是一个索引,因此也可以对其应用压缩。
-
JUNCTION_TABLE_IE1
, EXTRA_DATA
:由于二级索引覆盖了额外的数据,因此DBMS在从子级到父级的方向查询时,无需触表即可获取。主键充当集群键,因此从父级到子级查询时,额外的数据自然会被覆盖。
从物理上讲,您只有两棵 B 树(一棵是聚集表,另一棵是二级索引),根本没有表堆。这意味着良好的查询性能(通过简单的索引范围扫描即可满足父到子和子到父方向)以及插入/删除行时相当小的开销。
以下是等效的 MS SQL Server 语法(无索引压缩):
CREATE TABLE JUNCTION_TABLE (
PARENT_ID INT,
CHILD_ID INT,
EXTRA_DATA VARCHAR(50),
PRIMARY KEY (PARENT_ID, CHILD_ID),
FOREIGN KEY (PARENT_ID) REFERENCES PARENT_TABLE (PARENT_ID),
FOREIGN KEY (CHILD_ID) REFERENCES CHILD_TABLE (CHILD_ID)
);
CREATE UNIQUE INDEX JUNCTION_TABLE_IE1 ON
JUNCTION_TABLE (CHILD_ID, PARENT_ID) INCLUDE (EXTRA_DATA);
请注意,MS SQL Server 自动对表进行集群,除非 PRIMARY KEY非聚集已指定。
1 In other words, do you only need to get "children" of given "parent", or you might also need to get parents of given child.
2 Covering allows the query to be satisfied from the index alone, and avoids expensive double-lookup that would otherwise be necessary when accessing data through a secondary index in the clustered table.
3 This way, the extra data is not repeated (which would be expensive, since it's big), yet you avoid the double-lookup and replace it with (cheaper) table heap access. But, beware of clustering factor http://oracle-online-help.blogspot.com/2006/11/clustering-factor_28.html that can destroy the performance of range scans in heap-based tables!