我已经使用 SQL Alchemy 几个月了,到目前为止,它给我留下了深刻的印象。
我现在遇到了一个问题,这似乎是一个错误,但我不确定我是否在做正确的事情。我们在这里使用 MS SQL,通过表反射来定义表类,但是我可以使用内存 SQLite 数据库(我在此处包含的代码)来复制该问题。
我正在做的是使用两个表之间的链接表来定义两个表之间的多对多关系。链接表包含一条额外的信息,我想用它来过滤链接,需要在关系上使用 Primaryjoin 语句。这非常适合延迟加载,但是出于性能原因,我们需要急切加载,这就是一切失败的地方。
如果我定义与延迟加载的关系:
activefunds = relationship('Fund', secondary='fundbenchmarklink',
primaryjoin='and_(FundBenchmarkLink.isactive==True,'
'Benchmark.id==FundBenchmarkLink.benchmarkid,'
'Fund.id==FundBenchmarkLink.fundid)')
并正常查询DB:
query = session.query(Benchmark)
我需要的行为正是我想要的,尽管性能非常糟糕,因为在迭代所有基准测试及其各自的基金时需要额外的 SQL 查询。
如果我定义与预加载的关系:
activefunds = relationship('Fund', secondary='fundbenchmarklink',
primaryjoin='and_(FundBenchmarkLink.isactive==True,'
'Benchmark.id==FundBenchmarkLink.benchmarkid,'
'Fund.id==FundBenchmarkLink.fundid)',
lazy='joined')
并正常查询DB:
query = session.query(Benchmark)
它在我脸上爆炸:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: fund.id
[SQL: 'SELECT benchmark.id AS benchmark_id,
benchmark.name AS benchmark_name,
fund_1.id AS fund_1_id,
fund_1.name AS fund_1_name,
fund_2.id AS fund_2_id,
fund_2.name AS fund_2_name
FROM benchmark
LEFT OUTER JOIN (fundbenchmarklink AS fundbenchmarklink_1
JOIN fund AS fund_1 ON fund_1.id = fundbenchmarklink_1.fundid) ON benchmark.id = fundbenchmarklink_1.benchmarkid
LEFT OUTER JOIN (fundbenchmarklink AS fundbenchmarklink_2
JOIN fund AS fund_2 ON fund_2.id = fundbenchmarklink_2.fundid) ON fundbenchmarklink_2.isactive = 1
AND benchmark.id = fundbenchmarklink_2.benchmarkid
AND fund.id = fundbenchmarklink_2.fundid']
上面的 SQL 清楚地表明,在尝试访问链接表中的列之前,链接表并未被连接。
如果我查询数据库,特别是连接链接表:
query = session.query(Benchmark).join(FundBenchmarkLink, Fund, isouter=True)
它有效,但这意味着我现在必须确保每当查询基准表时,我总是必须定义联接以添加两个额外的表。
我是否遗漏了一些东西,这是一个潜在的错误,还是这只是库的工作方式?
用于复制问题的完整工作示例代码:
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger('sqlalchemy.engine.base').setLevel(logging.INFO)
from sqlalchemy import Column, DateTime, String, Integer, Boolean, ForeignKey, create_engine
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class FundBenchmarkLink(Base):
__tablename__ = 'fundbenchmarklink'
fundid = Column(Integer, ForeignKey('fund.id'), primary_key=True, autoincrement=False)
benchmarkid = Column(Integer, ForeignKey('benchmark.id'), primary_key=True, autoincrement=False)
isactive = Column(Boolean, nullable=False, default=True)
fund = relationship('Fund')
benchmark = relationship('Benchmark')
def __repr__(self):
return "<FundBenchmarkLink(fundid='{}', benchmarkid='{}', isactive='{}')>".format(self.fundid, self.benchmarkid, self.isactive)
class Benchmark(Base):
__tablename__ = 'benchmark'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
funds = relationship('Fund', secondary='fundbenchmarklink', lazy='joined')
# activefunds has additional filtering on the secondary table, requiring a primaryjoin statement.
activefunds = relationship('Fund', secondary='fundbenchmarklink',
primaryjoin='and_(FundBenchmarkLink.isactive==True,'
'Benchmark.id==FundBenchmarkLink.benchmarkid,'
'Fund.id==FundBenchmarkLink.fundid)',
lazy='joined')
def __repr__(self):
return "<Benchmark(id='{}', name='{}')>".format(self.id, self.name)
class Fund(Base):
__tablename__ = 'fund'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
def __repr__(self):
return "<Fund(id='{}', name='{}')>".format(self.id, self.name)
if '__main__' == __name__:
engine = create_engine('sqlite://')
Base.metadata.create_all(engine)
maker = sessionmaker(bind=engine)
session = maker()
# Create some data
for bmkname in ['foo', 'bar', 'baz']:
bmk = Benchmark(name=bmkname)
session.add(bmk)
for fname in ['fund1', 'fund2', 'fund3']:
fnd = Fund(name=fname)
session.add(fnd)
session.add(FundBenchmarkLink(fundid=1, benchmarkid=1))
session.add(FundBenchmarkLink(fundid=2, benchmarkid=1))
session.add(FundBenchmarkLink(fundid=1, benchmarkid=2))
session.add(FundBenchmarkLink(fundid=2, benchmarkid=2, isactive=False))
session.commit()
# This code snippet works when activefunds doesn't exist, or doesn't use eager loading
# query = session.query(Benchmark)
# print(query)
# for bmk in query:
# print(bmk)
# for fund in bmk.funds:
# print('\t{}'.format(fund))
# This code snippet works for activefunds with eager loading
query = session.query(Benchmark).join(FundBenchmarkLink, Fund, isouter=True)
print(query)
for bmk in query:
print(bmk)
for fund in bmk.activefunds:
print('\t{}'.format(fund))