如果你的数据已经在Python中,那么使用executemany() http://cx-oracle.readthedocs.io/en/latest/cursor.html#Cursor.executemany。在您有如此多行的情况下,您可能仍然会执行多个调用来插入批量记录。
更新:请参阅 cx_Oracle 文档批量语句执行和批量加载 https://cx-oracle.readthedocs.io/en/latest/user_guide/batch_statement.html.
更新 2:cx_Oracle 的最新版本(已更名为python-oracledb https://cjones-oracle.medium.com/open-source-python-thin-driver-for-oracle-database-e82aac7ecf5a)默认情况下以“精简”模式运行,绕过 Oracle 客户端库。这意味着在许多情况下数据加载速度更快。的用途和功能executemany()
新版本中还是一样。安装类似的东西python -m pip install oracledb
。这是当前的文档执行批量语句和批量加载 https://python-oracledb.readthedocs.io/en/latest/user_guide/batch_statement.html。另请参阅升级文档 https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_c.html#upgrading-from-cx-oracle-8-3-to-python-oracledb.
以下是使用 python-oracledb 命名空间的示例。如果您仍然使用 cx_Oracle,则更改import
to be import cx_Oracle as oracledb
:
import oracledb
import csv
...
Connect and open a cursor here...
...
# Predefine the memory areas to match the table definition.
# This can improve performance by avoiding memory reallocations.
# Here, one parameter is passed for each of the columns.
# "None" is used for the ID column, since the size of NUMBER isn't
# variable. The "25" matches the maximum expected data size for the
# NAME column
cursor.setinputsizes(None, 25)
# Adjust the number of rows to be inserted in each iteration
# to meet your memory and performance requirements
batch_size = 10000
with open('testsp.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
sql = "insert into test (id,name) values (:1, :2)"
data = []
for line in csv_reader:
data.append((line[0], line[1]))
if len(data) % batch_size == 0:
cursor.executemany(sql, data)
data = []
if data:
cursor.executemany(sql, data)
con.commit()
正如其他人指出的:
- 避免在语句中使用字符串插值,因为它存在安全风险。
这通常也是一个可扩展性问题。使用绑定变量。当您需要对列名称等内容使用字符串插值时,请确保对所有值进行清理。
- 如果数据已经在磁盘上,那么使用 SQL*Loader 或 Data Pump 之类的东西会比将其读入 cx_Oracle 然后将其发送到数据库更好。