peewee.OperationalError:仅更新插入 150 行 * 8 列时 SQL 变量太多

2024-01-08

下面的例子,在我的机器上,设置range(150)导致错误,同时range(100)才不是:

from peewee import *

database = SqliteDatabase(None)

class Base(Model):
    class Meta:
        database = database


colnames = ["A", "B", "C", "D", "E", "F", "G", "H"]
cols = {x: TextField() for x in colnames}

table = type('mytable', (Base,), cols)
database.init('test.db')
database.create_tables([table])

data = []
for x in range(150):
    data.append({x: 1 for x in colnames})


with database.atomic() as txn:
    table.insert_many(data).upsert().execute()

导致:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3213, in execute
    cursor = self._execute()
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 2628, in _execute
    return self.database.execute_sql(sql, params, self.require_commit)
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3461, in execute_sql
    self.commit()
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3285, in __exit__
    reraise(new_type, new_type(*exc_args), traceback)
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3454, in execute_sql
    cursor.execute(sql, params or ())
peewee.OperationalError: too many SQL variables

这对我来说似乎很低。我正在尝试使用peewee取代现有的pandas基于 SQL 构造,因为pandas缺乏对主键的支持。每个循环只能插入约 100 条记录的能力非常低,而且如果有一天列数增加的话,就会很脆弱。

我怎样才能让这项工作做得更好?是否可以?


经过一番调查,问题似乎与最大参数数量 https://www.sqlite.org/limits.html#max_variable_numberSQL 查询可以具有:SQLITE_MAX_VARIABLE_NUMBER。

为了能够进行大批量插入,我首先估计 SQLITE_MAX_VARIABLE_NUMBER,然后使用它在我想要插入的字典列表中创建块。

为了估计我使用这个函数的值受到这个答案的启发 https://stackoverflow.com/a/17872934/1860757:


def max_sql_variables():
    """Get the maximum number of arguments allowed in a query by the current
    sqlite3 implementation. Based on `this question
    `_

    Returns
    -------
    int
        inferred SQLITE_MAX_VARIABLE_NUMBER
    """
    import sqlite3
    db = sqlite3.connect(':memory:')
    cur = db.cursor()
    cur.execute('CREATE TABLE t (test)')
    low, high = 0, 100000
    while (high - 1) > low: 
        guess = (high + low) // 2
        query = 'INSERT INTO t VALUES ' + ','.join(['(?)' for _ in
                                                    range(guess)])
        args = [str(i) for i in range(guess)]
        try:
            cur.execute(query, args)
        except sqlite3.OperationalError as e:
            if "too many SQL variables" in str(e):
                high = guess
            else:
                raise
        else:
            low = guess
    cur.close()
    db.close()
    return low

SQLITE_MAX_VARIABLE_NUMBER = max_sql_variables()

然后我使用上面的变量来切片data


with database.atomic() as txn:
    size = (SQLITE_MAX_VARIABLE_NUMBER // len(data[0])) -1
    # remove one to avoid issue if peewee adds some variable
    for i in range(0, len(data), size):
        table.insert_many(data[i:i+size]).upsert().execute()

关于执行速度的更新max_sql_variables.

在具有 3 年历史的 4 核和 4 Gb RAM 的 Intel 机器上,运行 OpenSUSE tumbleweed,并将 SQLITE_MAX_VARIABLE_NUMBER 设置为 999,该函数的运行时间不到 100 毫秒。如果我设置high = 1000000,执行时间变为300ms左右。

在一台具有 8 核和 8Gb RAM 的较新的 Intel 机器上,运行 Kubuntu,将 SQLITE_MAX_VARIABLE_NUMBER 设置为 250000,该函数运行大约 2.6 秒并返回 99999。如果我设置high = 1000000,执行时间约为 4.5 秒。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

peewee.OperationalError:仅更新插入 150 行 * 8 列时 SQL 变量太多 的相关文章

随机推荐