要弄清楚在这种情况下发生了什么,最简单的方法是检查数据的 sstable2json (cassandra/bin) 表示形式。这将向您显示最终实际保存在磁盘上的内容。
这是适合您情况的示例
[
{"key": "4b6579","columns": [
["rid1:ssid1:","",1401469033325000],
["rid1:ssid1:end_date","2004-10-03 00:00:00-0700",1401469033325000],
["rid1:ssid1:report_date","2004-10-03 00:00:00-0700",1401469033325000],
["rid1:ssid1:start_date","2004-10-03 00:00:00-0700",1401469033325000],
["rid1:ssid1:subset_descr","descr",1401469033325000],
["rid1:ssid1:x","1",1401469033325000],
["rid1:ssid1:y","5.5",1401469033325000],
["rid1:ssid1:z","1",1401469033325000],
["rid2:ssid2:","",1401469938599000],
["rid2:ssid2:end_date", "2004-10-03 00:00:00-0700",1401469938599000],
["rid2:ssid2:report_date","2004-10-03 00:00:00-0700",1401469938599000],
["rid2:ssid2:start_date","2004-10-03 00:00:00-0700",1401469938599000],
["rid2:ssid2:subset_descr","descr",1401469938599000],
["rid2:ssid2:x","1",1401469938599000],
["rid2:ssid2:y","5.5",1401469938599000],
["rid2:ssid2:z","1",1401469938599000]
}
]
正如您在上面所看到的,分区键的值每个分区(每个 sstable)保存一次,在这种情况下,列名根本不重要,因为它是隐式给定表的。集群列的列名也不存在,因为使用 C* 时,如果不指定键的所有部分,则不允许插入。
剩下的确实有列名,这是在对行进行部分更新时需要的,这样就可以在没有其余行信息的情况下保存它。您可以想象对一行中的单个列字段进行更新,以指示这是 C* 的哪个字段当前使用列名称,但有一些票证可以将其更改为较小的表示形式。https://issues.apache.org/jira/browse/CASSANDRA-4175 https://issues.apache.org/jira/browse/CASSANDRA-4175
为了生成这个
cqlsh
CREATE TABLE mykeyspace.mytable( id text, report_id text, subset_id text, report_date timestamp, start_date timestamp, end_date timestamp, subset_descr text, x int, y double, z int, PRIMARY KEY (id, report_id, subset_id) );
INSERT INTO mykeyspace.mytable (id, report_id , subset_id , report_date , start_date , end_date , subset_descr ,x, y, z) VALUES ( 'Key', 'rid1','ssid1', '2004-10-03','2004-10-03','2004-10-03','descr',1,5.5,1);
INSERT INTO mykeyspace.mytable (id, report_id , subset_id , report_date , start_date , end_date , subset_descr ,x, y, z) VALUES ( 'Key', 'rid2','ssid2', '2004-10-03','2004-10-03','2004-10-03','descr',1,5.5,1);
exit;
nodetool flush
bin/sstable2json $DATA_DIR/mytable/mykeyspace-mytable-jb-1-Data.db