实体框架多重聚合性能

2024-04-17

我有一个关于实体框架查询构建的问题。

Schema

我有一个这样的表结构:

CREATE TABLE [dbo].[DataLogger](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [ProjectID] [bigint] NULL,
    CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

CREATE TABLE [dbo].[DCDistributionBox](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [DataLoggerID] [bigint] NOT NULL,
    CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

ALTER TABLE [dbo].[DCDistributionBox]
    ADD CONSTRAINT [FK_DCDistributionBox_DataLogger] FOREIGN KEY([DataLoggerID])
    REFERENCES [dbo].[DataLogger] ([ID])

CREATE TABLE [dbo].[DCString] (
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [DCDistributionBoxID] [bigint] NOT NULL,
    [CurrentMPP] [decimal](18, 2) NULL,
    CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

ALTER TABLE [dbo].[DCString]
    ADD CONSTRAINT [FK_DCString_DCDistributionBox] FOREIGN KEY([DCDistributionBoxID])
    REFERENCES [dbo].[DCDistributionBox] ([ID])

CREATE TABLE [dbo].[StringData](
    [DCStringID] [bigint] NOT NULL,
    [TimeStamp] [datetime] NOT NULL,
    [DCCurrent] [decimal](18, 2) NULL,
    CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED ( [TimeStamp] DESC, [DCStringID] ASC)
)

The [StringData]表具有以下存储统计信息:

  • 数据空间:26,901.86 MB
  • 行数:131,827,749
  • 分区:true
  • 分区数:62

Usage

我现在想将数据分组[StringData]表并进行一些聚合。

在纯 SQL 中,它看起来像这样:

declare @projectID bigint = 20686;
declare @from datetime = '06.02.2016';
declare @till datetime = '07.02.2016';
declare @interval int = 15;

SELECT 
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [StringData].[TimeStamp] ) / @interval * @interval, 0) AS [TimeStamp]
, AVG([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentAvg]
, MIN([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentMin]
, MAX([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentMax]
, STDEV([StringData].[DCCurrent] / [DCString].[CurrentMPP]) AS [DCCurrentStDev]
, COUNT(*) AS [Count]

FROM [StringData]
JOIN [DCString] ON [DCString].[ID] = [StringData].[DCStringID]
JOIN [DCDistributionBox] ON [DCDistributionBox].[ID] = [DCString].[DCDistributionBoxID]
JOIN [DataLogger] ON [DataLogger].[ID] = [DCDistributionBox].[DataLoggerID]

WHERE [DataLogger].[ProjectID] = @projectID
AND [StringData].[TimeStamp] >= @from
AND [StringData].[TimeStamp] < @till

GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [StringData].[TimeStamp] ) / @interval * @interval, 0)

执行时间跨度: 653ms

现在我创建了一个等效的实体框架(我认为):

var compareData = model.StringDatas
    AsNoTracking()
    .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp >= from && p.TimeStamp < till)                    
    .Select(d => new
    {
        TimeStamp = d.Key,
        DCCurrentMin = d.Min(v => v.DCCurrent / v.DCString.CurrentMPP),
        DCCurrentMax = d.Max(v => v.DCCurrent / v.DCString.CurrentMPP),
        DCCurrentAvg = d.Average(v => v.DCCurrent / v.DCString.CurrentMPP),
        DCCurrentStDev = DbFunctions.StandardDeviation(d.Select(v => v.DCCurrent / v.DCString.CurrentMPP))
    })
    .ToList();

而且执行的结果是超时(超过30秒)!?

Attempts

我现在查看了实体框架生成的 SQL 查询,如下所示:

SELECT 
1 AS [C1], 
[Project10].[C1] AS [C2], 
[Project10].[C2] AS [C3], 
[Project10].[C3] AS [C4], 
[Project10].[C4] AS [C5], 
[Project10].[C5] AS [C6]
FROM ( SELECT 
    [Project8].[C1] AS [C1], 
    [Project8].[C2] AS [C2], 
    [Project8].[C3] AS [C3], 
    [Project8].[C4] AS [C4], 
    (SELECT 
        STDEV([Project9].[A1]) AS [A1]
        FROM ( SELECT 
            [Project9].[DCCurrent] / [Project9].[CurrentMPP] AS [A1]
            FROM ( SELECT 
                [Extent17].[DCStringID] AS [DCStringID], 
                [Extent17].[DCCurrent] AS [DCCurrent], 
                [Extent18].[ID] AS [ID], 
                [Extent18].[CurrentMPP] AS [CurrentMPP]
                FROM    [dbo].[StringData] AS [Extent17]
                INNER JOIN [dbo].[DCString] AS [Extent18] ON [Extent17].[DCStringID] = [Extent18].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent19] ON [Extent18].[DCDistributionBoxID] = [Extent19].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent20] ON [Extent19].[DataLoggerID] = [Extent20].[ID]
                WHERE (([Extent20].[ProjectID] = @p__linq__0) OR (([Extent20].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent17].[TimeStamp] >= @p__linq__1) AND ([Extent17].[TimeStamp] < @p__linq__2) AND (([Project8].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent17].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project8].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent17].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
            )  AS [Project9]
        )  AS [Project9]) AS [C5]
    FROM ( SELECT 
        [Project6].[C1] AS [C1], 
        [Project6].[C2] AS [C2], 
        [Project6].[C3] AS [C3], 
        (SELECT 
            AVG([Project7].[A1]) AS [A1]
            FROM ( SELECT 
                [Project7].[DCCurrent] / [Project7].[CurrentMPP] AS [A1]
                FROM ( SELECT 
                    [Extent13].[DCStringID] AS [DCStringID], 
                    [Extent13].[DCCurrent] AS [DCCurrent], 
                    [Extent14].[ID] AS [ID], 
                    [Extent14].[CurrentMPP] AS [CurrentMPP]
                    FROM    [dbo].[StringData] AS [Extent13]
                    INNER JOIN [dbo].[DCString] AS [Extent14] ON [Extent13].[DCStringID] = [Extent14].[ID]
                    INNER JOIN [dbo].[DCDistributionBox] AS [Extent15] ON [Extent14].[DCDistributionBoxID] = [Extent15].[ID]
                    INNER JOIN [dbo].[DataLogger] AS [Extent16] ON [Extent15].[DataLoggerID] = [Extent16].[ID]
                    WHERE (([Extent16].[ProjectID] = @p__linq__0) OR (([Extent16].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent13].[TimeStamp] >= @p__linq__1) AND ([Extent13].[TimeStamp] < @p__linq__2) AND (([Project6].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent13].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project6].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent13].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
                )  AS [Project7]
            )  AS [Project7]) AS [C4]
        FROM ( SELECT 
            [Project4].[C1] AS [C1], 
            [Project4].[C2] AS [C2], 
            (SELECT 
                MAX([Project5].[A1]) AS [A1]
                FROM ( SELECT 
                    [Project5].[DCCurrent] / [Project5].[CurrentMPP] AS [A1]
                    FROM ( SELECT 
                        [Extent9].[DCStringID] AS [DCStringID], 
                        [Extent9].[DCCurrent] AS [DCCurrent], 
                        [Extent10].[ID] AS [ID], 
                        [Extent10].[CurrentMPP] AS [CurrentMPP]
                        FROM    [dbo].[StringData] AS [Extent9]
                        INNER JOIN [dbo].[DCString] AS [Extent10] ON [Extent9].[DCStringID] = [Extent10].[ID]
                        INNER JOIN [dbo].[DCDistributionBox] AS [Extent11] ON [Extent10].[DCDistributionBoxID] = [Extent11].[ID]
                        INNER JOIN [dbo].[DataLogger] AS [Extent12] ON [Extent11].[DataLoggerID] = [Extent12].[ID]
                        WHERE (([Extent12].[ProjectID] = @p__linq__0) OR (([Extent12].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent9].[TimeStamp] >= @p__linq__1) AND ([Extent9].[TimeStamp] < @p__linq__2) AND (([Project4].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent9].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project4].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent9].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
                    )  AS [Project5]
                )  AS [Project5]) AS [C3]
            FROM ( SELECT 
                [Project2].[C1] AS [C1], 
                (SELECT 
                    MIN([Project3].[A1]) AS [A1]
                    FROM ( SELECT 
                        [Project3].[DCCurrent] / [Project3].[CurrentMPP] AS [A1]
                        FROM ( SELECT 
                            [Extent5].[DCStringID] AS [DCStringID], 
                            [Extent5].[DCCurrent] AS [DCCurrent], 
                            [Extent6].[ID] AS [ID], 
                            [Extent6].[CurrentMPP] AS [CurrentMPP]
                            FROM    [dbo].[StringData] AS [Extent5]
                            INNER JOIN [dbo].[DCString] AS [Extent6] ON [Extent5].[DCStringID] = [Extent6].[ID]
                            INNER JOIN [dbo].[DCDistributionBox] AS [Extent7] ON [Extent6].[DCDistributionBoxID] = [Extent7].[ID]
                            INNER JOIN [dbo].[DataLogger] AS [Extent8] ON [Extent7].[DataLoggerID] = [Extent8].[ID]
                            WHERE (([Extent8].[ProjectID] = @p__linq__0) OR (([Extent8].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent5].[TimeStamp] >= @p__linq__1) AND ([Extent5].[TimeStamp] < @p__linq__2) AND (([Project2].[C1] = (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent5].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3))) OR (([Project2].[C1] IS NULL) AND (DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent5].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) IS NULL)))
                        )  AS [Project3]
                    )  AS [Project3]) AS [C2]
                FROM ( SELECT 
                    [Distinct1].[C1] AS [C1]
                    FROM ( SELECT DISTINCT 
                        DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Extent1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [C1]
                        FROM    [dbo].[StringData] AS [Extent1]
                        INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                        INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                        INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                        WHERE (([Extent4].[ProjectID] = @p__linq__0) OR (([Extent4].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
                    )  AS [Distinct1]
                )  AS [Project2]
            )  AS [Project4]
        )  AS [Project6]
    )  AS [Project8]
)  AS [Project10]

Question

为什么实体框架将每个聚合分离为单个子选择,以及如何避免这种情况以获得接近原始 SQL 查询的性能?

Update 1

这具有完全相同的 SQL 查询输出和超时结果:

var query = from d in model.StringDatas
        where d.DCString.DCDistributionBox.DataLogger.ProjectID == projectID
        where d.TimeStamp >= fromDate
        where d.TimeStamp < tillDate
        group d by DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval) into g
        select new
        {
            TimeStamp = g.Key,
            DCCurrentMin = g.Min(v => v.DCCurrent / v.DCString.CurrentMPP),
            DCCurrentMax = g.Max(v => v.DCCurrent / v.DCString.CurrentMPP),
            DCCurrentAvg = g.Average(v => v.DCCurrent / v.DCString.CurrentMPP),
            DCCurrentStDev = DbFunctions.StandardDeviation(g.Select(v => v.DCCurrent / v.DCString.CurrentMPP))
        };

var queryResult= query.ToList();

我在回答时注意到了这种行为(错误?)如何让 EF6 生成包含多个聚合列的高效 SQL? https://stackoverflow.com/questions/35320252/how-do-i-get-ef6-to-generate-efficient-sql-containing-mulitple-aggregate-columns/35320905#35320905.

我能够解决这个问题的唯一方法是引入临时投影before the group by操作,这同样适用于您的情况:

var query =
    from e in (from d in db.StringDatas.AsNoTracking()
               where d.DCString.DCDistributionBox.DataLogger.ProjectID == projectID
                   && d.TimeStamp >= fromDate && d.TimeStamp < tillDate
               select new { d, s = d.DCString })
    group e by DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, e.d.TimeStamp) / minuteInterval * minuteInterval) into g
    let ratio = g.Select(e => e.d.DCCurrent / e.s.CurrentMPP)
    select new
    {
        TimeStamp = g.Key,
        DCCurrentMin = ratio.Min(),
        DCCurrentMax = ratio.Max(),
        DCCurrentAvg = ratio.Average(),
        DCCurrentStDev = DbFunctions.StandardDeviation(ratio)
    };

EF 生成的 SQL:

SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[A1] AS [C3], 
    [GroupBy1].[A2] AS [C4], 
    [GroupBy1].[A3] AS [C5], 
    [GroupBy1].[A4] AS [C6]
    FROM ( SELECT 
        [Project1].[K1] AS [K1], 
        MIN([Project1].[A1]) AS [A1], 
        MAX([Project1].[A2]) AS [A2], 
        AVG([Project1].[A3]) AS [A3], 
        STDEV([Project1].[A4]) AS [A4]
        FROM ( SELECT 
            DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A1], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A2], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A3], 
            [Project1].[DCCurrent] / [Project1].[CurrentMPP] AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp] AS [TimeStamp], 
                [Extent1].[DCCurrent] AS [DCCurrent], 
                [Extent2].[CurrentMPP] AS [CurrentMPP]
                FROM    [dbo].[StringDatas] AS [Extent1]
                INNER JOIN [dbo].[DCStrings] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBoxes] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLoggers] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
            )  AS [Project1]
        )  AS [Project1]
        GROUP BY [K1]
    )  AS [GroupBy1]
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

实体框架多重聚合性能 的相关文章

  • C free() 是如何工作的? [复制]

    这个问题在这里已经有答案了 可能的重复 malloc 和 free 如何工作 https stackoverflow com questions 1119134 how malloc and free work include
  • 异常堆栈跟踪不显示抛出异常的位置

    通常 当我抛出异常 捕获它并打印出堆栈跟踪时 我会看到抛出异常的调用 导致该异常的调用 导致该异常的调用that 依此类推回到整个程序的根 现在它只向我显示异常所在的调用caught 而不是它所在的地方thrown 我不明白是什么改变导致了
  • 将 OpenCV Mat 转换为数组(可能是 NSArray)

    我的 C C 技能很生疏 OpenCV 的文档也相当晦涩难懂 有没有办法获得cv Mat data属性转换为数组 NSArray 我想将其序列化为 JSON 我知道我可以使用 FileStorage 实用程序转换为 YAML XML 但这不
  • C# 处理标准输入

    我目前正在尝试通过命令行断开与网络文件夹的连接 并使用以下代码 System Diagnostics Process process2 new System Diagnostics Process System Diagnostics Pr
  • 如何以编程方式播放 16 位 pcm 数组 [关闭]

    Closed 这个问题不符合堆栈溢出指南 help closed questions 目前不接受答案 我有一个包含 16 位 pcm 值的短 数组 我希望能够在不添加任何标题 也不将任何文件保存到内存的情况下播放它 我知道我可能需要一个提供
  • 有什么方法可以重载 C# 中的扩展方法吗?

    我有以下模型模式 public abstract class PARENTCLASS public class CHILD A CLASS PARENTCLASS public static class EXTENSION public s
  • 静态类与类的实例

    我有一个静态类 用于访问我的公共属性 整个应用程序的全局属性 和我在应用程序运行期间使用的方法 例如 我在静态类中设置了一些属性 并且在应用程序运行时我可以从属性中获取值 但我可以使用单例模式创建非静态类并以相同的方式使用它 问题 对于我的
  • 你好,我最近正在开发我的新游戏,我遇到了*无限跳跃*的问题

    所以基本上当我按跳跃 空格键时我会跳跃但是如果我连续按空格键它 只是跳啊跳啊跳等等 我不想要我只想它跳一次 code if Input GetKeyDown space isGrounded velocity y Mathf Sqrt ju
  • 通过 C# Mailkit / Mimekit 发送电子邮件,但出现服务器证书错误

    Visual Studio 2015 中的 0 代码 1 我正在使用 Mailkit 最新版本 1 18 1 1 从我自己的电子邮件服务器发送电子邮件 2 电子邮件服务器具有不受信任的自签名证书 3 我在代码中添加了以下两行 以忽略服务器证
  • Xamarin - SignalR 挂在连接上

    我正在尝试将我的 Xamarin 应用程序连接到托管在 Azure 上的 SignalR 后端 我遇到的问题是每次我在 HubConnection 上调用 StartAsync 时 它都会挂起客户端并且请求永远不会完成 我尝试通过应用程序进
  • 为什么不能调用带有 auto& 参数的 const mutable lambda?

    include
  • C++ 到 C# 事件处理

    所以我有我的C WinForm 应用程序 我从中调用我的C CLI MFC dll图书馆 但也有一些events在我的 C 库上 甚至此事件也发生在该库的本机 非 CLI 部分 我需要从我的 C 应用程序调用一些代码 并获取一些有关此事件的
  • 使用多线程进行矩阵乘法?

    我应该使用线程将两个矩阵相乘 有两件事 当我运行程序时 我不断得到 0 我还收到消息错误 对于每个错误 它在粗体行上显示 警告 从不兼容的指针类型传递 printMatrix 的参数1 我尝试打印输出 还要注意 第一个粗体块 这是我解决问题
  • C# 中的 C/C++ 代码编译器

    在 C 中 我可以使用下面的代码编译 VB 和 C 代码 但无法编译 C C 代码 有什么办法可以做到这一点吗 C 编译器 public void Compile string ToCompile string Result null st
  • 从 NumPy 数组到 Mat 的 C++ 转换 (OpenCV)

    我正在围绕 ArUco 增强现实库 基于 OpenCV 编写一个薄包装器 我试图构建的界面非常简单 Python 将图像传递给 C 代码 C 代码检测标记并将其位置和其他信息作为字典元组返回给 Python 但是 我不知道如何在 Pytho
  • 如何在dll级别读取app.config? [复制]

    这个问题在这里已经有答案了 我在一个解决方案中有一个控制台应用程序项目和库项目 dll The 图书馆项目有 app config 文件 我在其中存储我在库中使用的一些键值对 控制台应用程序引用此 dll 我有另一个 app config
  • C 语言中的 Alpha 混合 2 RGBA 颜色[重复]

    这个问题在这里已经有答案了 可能的重复 如何快速进行阿尔法混合 https stackoverflow com questions 1102692 how to do alpha blend fast 对 2 个 RGBA 整数 颜色进行
  • 根据最大值连接表

    这是我正在谈论的内容的一个简化示例 Table students exam results id name id student id score date 1 Jim 1 1 73 8 1 09 2 Joe 2 1 67 9 2 09 3
  • 将 char 绑定到枚举类型

    我有一段与此非常相似的代码 class someclass public enum Section START MID END vector section Full void ex for int i 0 i section
  • 嵌入式二进制资源 - 如何枚举嵌入的图像文件?

    我按照中的说明进行操作这本书 http www apress com book view 9781430225492 关于资源等的章节 我不太明白的是 如何替换它 images Add new BitmapImage new Uri Ima

随机推荐