我在 Datomic 数据库中有一个与此类似的架构:
; --- tenant
{:db/id #db/id[:db.part/db]
:db/ident :tenant/guid
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :tenant/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :tenant/taks
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
; --- task
{:db/id #db/id[:db.part/db]
:db/ident :task/guid
:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :task/createdAt
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :task/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :task/subtasks
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
; --- subtask
{:db/id #db/id[:db.part/db]
:db/ident :subtask/guid
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/type
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/startedAt
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/completedAt
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :subtask/participants
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
; --- participant
{:db/id #db/id[:db.part/db]
:db/ident :participant/guid
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :participant/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db.install/_attribute :db.part/db}
随着时间的推移,任务相当静态,但每个任务平均每 5 分钟就会添加和删除一次子任务。我想说,在任何给定时间,每个任务平均有大约 40 个子任务,其中包含(几乎总是,但也有一些例外)一个参与者。我使用 Datomic 的唯一目的是能够了解任务如何随时间演变,即我想了解任务在给定时间的情况。为了实现这一目标,我目前正在做类似的事情:
(defn find-tasks-by-tenant-at-time
[conn tenant-guid ^long time-epoch]
(let [db-conn (-> conn d/db (d/as-of (Date. time-epoch)))
task-ids (->> (d/q '[:find ?taskIds
:in $ ?tenantGuid
:where
[?tenantId :tenant/guid ?tenantGuid]
[?tenantId :tenant/tasks ?taskIds]]
db-conn tenant-guid)
vec flatten)
task-entities (map #(d/entity db-conn %) task-ids)
dtos (map (fn [task]
(letfn [(participant-dto [participant]
{:id (:participant/guid participant)
:name (:participant/name participant)})
(subtask-dto [subtask]
{:id (:subtask/guid subtask)
:type (:subtask/type subtask)
:participants (map participant-dto (:subtask/participants subtask))})]
{:id (:task/guid task)
:name (:task/name task)
:subtasks (map subtask-dto (:task/subtasks task))})) task-entities)]
dtos))
不幸的是,这非常慢。如果租户有许多任务(例如 20 个),每个任务包含大约 40 个子任务,则从该函数返回可能需要近 60 秒的时间。我在这里做明显错误的事情吗?有可能加快这个速度吗?
更新:
整个数据集大约为 2 Gb,对等方有 3.5 Gb 内存(但如果我将其减少到 1.5 Gb,似乎没有任何区别),而交易者有 1 Gb 内存。我正在使用 Datomic 免费版。
在开始分析等之前,您可以替换
[:find ?taskIds ...]
by
[:find (pull ?task-entity [*]) ...]
减少到对等点的往返次数,从而消除了 map 语句task-entities
。在第二步中替换[*]
使用您真正想要为每个实体提取的适当的密钥集。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)