3.比较direct方式使用并行和非并行选项的性能差异。
数据文件:leo_test.data 100000记录
控制文件:leo_test.ctl
日志文件:leo_test.log
串行直接加载sqlload : sqlldr userid=ls/ls control=leo_test.ctl data=leo_test.data log=leo_test.log direct=true
并行直接加载sqlload : sqlldr userid=ls/ls control=leo_test.ctl data=leo_test.data log=leo_test.log direct=true parallel=true
1.现在演示“串行直接加载”
LS@LEO> select count(*) from leo_test_sqlload; 加载之前
COUNT(*)
----------
0
Sun Sep 9 10:58:24 CST 2012 58:24
[oracle@secdb1 ~]$ sqlldr userid=ls/ls control=leo_test.ctl data=leo_test.data log=leo_test.log direct=true
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 10:58:26 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 100000.
[oracle@secdb1 ~]$ date
Sun Sep 9 10:58:45 CST 2012 58:45-58:24=21秒 即用时21秒 我这是手算的,一会看看日志里记录的是多少
LS@LEO> select count(*) from leo_test_sqlload; 加载之后
COUNT(*)
----------
100000
[oracle@secdb1 ~]$ more leo_test.log
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 10:58:26 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Control File: leo_test.ctl
Data File: leo_test.data
Bad File: leo_test.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation: none specified
Path used: Direct 【sql*loader采用串行直接加载方式加载数据】
Table LEO_TEST_SQLLOAD, loaded from every logical record. 加载的表名
Insert option in effect for this table: APPEND 追加方式加载数据
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype 列名
------------------------------ ---------- ----- ---- ---- ---------------------
START_TIME FIRST * | DATE YYYY-MM-DD HH24:MI:SS
END_TIME NEXT * | DATE YYYY-MM-DD HH24:MI:SS
PROTOCOL NEXT * | CHARACTER
PRIVATE_IP NEXT * | CHARACTER
PRIVATE_PORT NEXT * | CHARACTER
SRC_IP NEXT * | CHARACTER
SRC_PORT NEXT * | CHARACTER
DEST_IP NEXT * | CHARACTER
DEST_PORT NEXT * | CHARACTER
The following index(es) on table LEO_TEST_SQLLOAD were processed: 并把索引也加载了10万个索引键值
index LS.LEO_INDEX1 loaded successfully with 100000 keys
Table LEO_TEST_SQLLOAD:
100000 Rows successfully loaded. 成功加载10万行记录
0 Rows not loaded due to data errors. 由于数据错误,0行没有加载
0 Rows not loaded because all WHEN clauses were failed. 因为所有条款都失败的时候,0行没有加载
0 Rows not loaded because all fields were null. 因为所有字段都是空的,0行没有加载
Date cache:
Max Size: 1000
Entries : 65
Hits : 199935
Misses : 0
Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576
Total logical records skipped: 0 跳过逻辑记录数总和0
Total logical records read: 100000 读取逻辑记录数总和100000
Total logical records rejected: 0 拒绝逻辑记录数总和0
Total logical records discarded: 0 丢弃逻辑记录数总和0
Total stream buffers loaded by SQL*Loader main thread: 26
Total stream buffers loaded by SQL*Loader load thread: 17
Run began on Sun Sep 09 10:58:26 2012 开始的时间
Run ended on Sun Sep 09 10:58:41 2012 结束的时间
Elapsed time was: 00:00:14.70 即用时14.7秒 这是机器算的
CPU time was: 00:00:00.38 CPU占用0.38秒
2.现在演示“并行直接加载”
并行加载
(1)并行加载和并行insert机制差不多
(2)并行加载可以启动多个并行进程,同时加载多个文件
(3)并行加载可以启动多个并行进程,分拆一个文件加载
LS@LEO> select count(*) from leo_test_sqlload; 加载之前
COUNT(*)
----------
0
Sun Sep 9 11:28:13 CST 2012 28:13
[oracle@secdb1 ~]$ sqlldr userid=ls/ls control=leo_test.ctl data=leo_test.data log=leo_test.log direct=true parallel=true
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 11:28:14 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 100000.
[oracle@secdb1 ~]$ date
Sun Sep 9 11:28:28 CST 2012 28:28-28:13=15秒 即用时15秒 我这是手算的,一会看看日志里记录的是多少
LS@LEO> select count(*) from leo_test_sqlload; 加载之后
COUNT(*)
----------
100000
[oracle@secdb1 ~]$ more leo_test.log
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 11:28:14 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Control File: leo_test.ctl
Data File: leo_test.data
Bad File: leo_test.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation: none specified
Path used: Direct - with parallel option. 【sql*loader采用并行直接加载方式加载数据】
Table LEO_TEST_SQLLOAD, loaded from every logical record. 加载的表名
Insert option in effect for this table: APPEND 追加的方式加载数据
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype 列名
------------------------------ ---------- ----- ---- ---- ---------------------
START_TIME FIRST * | DATE YYYY-MM-DD HH24:MI:SS
END_TIME NEXT * | DATE YYYY-MM-DD HH24:MI:SS
PROTOCOL NEXT * | CHARACTER
PRIVATE_IP NEXT * | CHARACTER
PRIVATE_PORT NEXT * | CHARACTER
SRC_IP NEXT * | CHARACTER
SRC_PORT NEXT * | CHARACTER
DEST_IP NEXT * | CHARACTER
DEST_PORT NEXT * | CHARACTER
Table LEO_TEST_SQLLOAD:
100000 Rows successfully loaded. 成功加载10万行记录
0 Rows not loaded due to data errors. 由于数据错误,0行没有加载
0 Rows not loaded because all WHEN clauses were failed. 因为所有条款都失败的时候,0行没有加载
0 Rows not loaded because all fields were null. 因为所有字段都是空的,0行没有加载
Date cache:
Max Size: 1000
Entries : 65
Hits : 199935
Misses : 0
Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576
Total logical records skipped: 0 跳过逻辑记录数总和0
Total logical records read: 100000 读取逻辑记录数总和100000
Total logical records rejected: 0 拒绝逻辑记录数总和0
Total logical records discarded: 0 丢弃逻辑记录数总和0
Total stream buffers loaded by SQL*Loader main thread: 26
Total stream buffers loaded by SQL*Loader load thread: 17
Run began on Sun Sep 09 11:28:14 2012 开始的时间
Run ended on Sun Sep 09 11:28:24 2012 结束的时间
Elapsed time was: 00:00:09.18 即用时9.18秒 这是机器算的
CPU time was: 00:00:00.42 CPU占用0.38秒
小结:从时间效率上看“并行直接加载”比“串行直接加载”的效率要高,但我们都知道“并行直接加载”如果表中有索引会导致加载失败,因为oracle不能一边并行一边维护索引的完整性。
4.直接加载对约束性索引和非约束型索引的影响。
一、条件
(1)现在我们使用sql*load来加载100000条数据
(2)加载的leo_test_sqlload表上有索引
二、实验
1.非约束性索引,直接加载在加载完成后会维护索引的完整性
$ cat leo_test.data | wc -l 检查数据文件内的记录数10万条,这就是我们要加载的数据
100000
$ cat leo_test.ctl 控制文件内容
LOAD DATA
INFILE '/home/oracle/leo_test.data' 数据源
APPEND INTO TABLE leo_test_sqlload 加载数据的表
FIELDS TERMINATED BY '|' 文本数据的分隔符
TRAILING NULLCOLS
(START_TIME DATE 'YYYY-MM-DD HH24:MI:SS',END_TIME DATE 'YYYY-MM-DD HH24:MI:SS',PROTOCOL,PRIVATE_IP,PRIVATE_PORT
,SRC_IP,SRC_PORT,DEST_IP,DEST_PORT) 表中字段的名
LS@LEO> desc leo_test_sqlload; 表的结构
Name Null? Type
----------------------------------------- -------- ----------------------------
START_TIME DATE
END_TIME DATE
PROTOCOL VARCHAR2(20)
PRIVATE_IP VARCHAR2(20)
PRIVATE_PORT VARCHAR2(20)
SRC_IP VARCHAR2(20)
SRC_PORT VARCHAR2(20)
DEST_IP VARCHAR2(20)
DEST_PORT VARCHAR2(20)
LS@LEO> select index_name,status from user_indexes where table_name='LEO_TEST_SQLLOAD'; 现在表上没有索引,我们定义一个索引
no rows selected
LS@LEO> create index leo_index1 on leo_test_sqlload(start_time); 我们在start_time字段添加了索引
Index created.
LS@LEO> select index_name,status from user_indexes where table_name='LEO_TEST_SQLLOAD'; 索引已经生效
INDEX_NAME STATUS
------------------------------ --------
LEO_INDEX1 VALID
[oracle@secdb1 ~]$ sqlldr userid=ls/ls control=leo_test.ctl data=leo_test.data log=leo_test.log direct=true 直接加载
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 09:18:13 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 100000.
LS@LEO> select count(*) from leo_test_sqlload; 已经加载了10万条记录
COUNT(*)
----------
100000
LS@LEO> select index_name,status from user_indexes where table_name='LEO_TEST_SQLLOAD'; 索引状态还是有效的
INDEX_NAME STATUS
------------------------------ --------
LEO_INDEX1 VALID
小结:非约束索引下,直接加载会维护索引的完整性,在数据加载入库后索引还是有效的。
2.约束性索引【主键】,直接加载依然会把数据加载入库,但索引会失效unusable,并且在日志中没有提示,必须手工rebuild重新建立
数据文件:leo_test1.data
控制文件:leo_test1.ctl
日志文件:leo_test1.log
sqlload : sqlldr userid=ls/ls control=leo_test1.ctl data=leo_test1.data log=leo_test1.log direct=true
LS@LEO> desc leo_test_sqlload1 表结构
Name Null? Type
----------------------------------------- -------- ----------------------------
START_TIME DATE
END_TIME DATE
PROTOCOL VARCHAR2(20)
PRIVATE_IP VARCHAR2(20)
PRIVATE_PORT VARCHAR2(20)
SRC_IP VARCHAR2(20)
SRC_PORT VARCHAR2(20)
DEST_IP VARCHAR2(20)
DEST_PORT VARCHAR2(20)
LS@LEO> select index_name,status from user_indexes where table_name='LEO_TEST_SQLLOAD1'; 没有主键,我们现在建一个
no rows selected
LS@LEO> alter table leo_test_sqlload1 add constraint pk_leo_test1 primary key(dest_port); 创建主键
Table altered.
LS@LEO> select index_name,status from user_indexes where table_name='LEO_TEST_SQLLOAD1'; 主键已经生效
INDEX_NAME STATUS
------------------------------ --------
PK_LEO_TEST1 VALID
LS@LEO> select count(*) from leo_test_sqlload1; 表中没有数据我们开始加载
COUNT(*)
----------
0
[oracle@secdb1 ~]$ sqlldr userid=ls/ls control=leo_test1.ctl data=leo_test1.data log=leo_test1.log direct=true
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 10:07:49 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 100.
LS@LEO> select * from leo_test_sqlload1 where rownum <= 6;
START_TIME END_TIME PROTOCOL PRIVATE_IP PRIVATE_PORT SRC_IP SRC_PORT DEST_IP DEST_PORT
------------------- ------------------- -------------------- ------------ ------------ -------------------- ----------
2012-08-08 20:59:54 2012-08-08 21:00:28 6 2886756061 1111 3395517721 45031 3419418065 80
2012-08-08 21:00:26 2012-08-08 21:00:28 6 2886900807 1111 395507143 51733 3658060738 80
2012-08-08 20:59:55 2012-08-08 21:00:28 6 2886917742 1111 3395507159 43516 2071873572 80
2012-08-08 20:59:55 2012-08-08 21:00:28 6 2886917742 1111 3395507159 43534 2071873572 80
2012-08-08 20:59:55 2012-08-08 21:00:28 6 2886917742 1111 3395507159 43523 2071873572 80
2012-08-08 21:00:14 2012-08-08 21:00:28 6 2886832065 1111 3395507109 51442 2099718013 80
6 rows selected.
我们已经加载了100条记录,dest_port列值都是80,违背了主键约束,说明索引失效
LS@LEO> select index_name,status from user_indexes where table_name='LEO_TEST_SQLLOAD1'; 索引已经失效
INDEX_NAME STATUS
------------------------------ --------
PK_LEO_TEST1 UNUSABLE
结论:在OLAP系统中实时表不应该使用约束【因为是海量数据重复是正常的】,在维度表中可以使用约束。
3.如果使用并行+直接加载数据的话,如果表中有索引,会导致加载失败,可以使用skip_index_maintenance选项“跳过索引维护”,来到达数据加载的目的,但是此时索引会无效unusable,必须手工rebuild重新建立
重新搭建环境
LS@LEO> truncate table leo_test_sqlload1;
Table truncated.
LS@LEO> alter table leo_test_sqlload1 drop constraint pk_leo_test1;
Table altered.
LS@LEO> alter table leo_test_sqlload1 add constraint pk_leo_test1 primary key(dest_port);
Table altered.
[oracle@secdb1 ~]$ sqlldr userid=ls/ls control=leo_test1.ctl data=leo_test1.data log=leo_test1.log direct=true parallel=true
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 10:29:42 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL*Loader-951: Error calling once/load initialization
ORA-26002: Table LS.LEO_TEST_SQLLOAD1 has index defined upon it. 表上有索引定义导致加载失败
[oracle@secdb1 ~]$ sqlldr userid=ls/ls control=leo_test1.ctl data=leo_test1.data log=leo_test1.log direct=true parallel=true
skip_index_maintenance=true
SQL*Loader: Release 10.2.0.1.0 - Production on Sun Sep 9 10:32:51 2012 我们跳过索引维护成功加载100条记录
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 100.
LS@LEO> select count(*) from leo_test_sqlload1;
COUNT(*)
----------
100
LS@LEO> select index_name,status from user_indexes where table_name='LEO_TEST_SQLLOAD1'; 索引此时是无效状态
INDEX_NAME STATUS
------------------------------ --------
PK_LEO_TEST1 UNUSABLE
小结:这就告诉我们了并行就是切片,一边切片一边维护索引完整性是做不到的,我们只能在加载数据后重新建立索引。