从30秒到30毫秒:EF Core查询性能优化实战全记录

作者:微信公众号:【架构师老卢】
7-1 8:48
17

性能对任何应用程序都至关重要。

开发者经常在慢速数据库查询上添加缓存层,这只是在掩盖症状而非解决问题。本文将挑战如何优化一个真实场景中的EF Core慢查询。EF Core提供了强大的工具,但使用不当会导致查询缓慢。

我将一步步展示如何将EF Core查询从难以接受的30秒优化到惊人的30毫秒。

挑战与慢查询分析

我们以一个社交媒体平台为例,包含以下实体:

public class User
{
    public int Id { get; set; }
    public string Username { get; set; } = null!;
    public ICollection<Post> Posts { get; set; } = new List<Post>();
    public ICollection<Comment> Comments { get; set; } = new List<Comment>();
}

public class Post
{
    public int Id { get; set; }
    public string Content { get; set; } = null!;
    public int UserId { get; set; }
    public User User { get; set; } = null!;
    public int CategoryId { get; set; }
    public Category Category { get; set; } = null!;
    public ICollection<Like> Likes { get; set; } = new List<Like>();
    public ICollection<Comment> Comments { get; set; } = new List<Comment>();
}

public class Category
{
    public int Id { get; set; }
    public string Name { get; set; } = null!;
    public ICollection<Post> Posts { get; set; } = new List<Post>();
}

public class Comment
{
    public int Id { get; set; }
    public int UserId { get; set; }
    public User User { get; set; } = null!;
    public int PostId { get; set; }
    public Post Post { get; set; } = null!;
    public string Text { get; set; } = null!;
    public DateTime CreatedAt { get; set; }
}

public class Like
{
    public int Id { get; set; }
    public int PostId { get; set; }
    public Post Post { get; set; } = null!;
}

实体关系如下:

  • 用户:每个用户有多篇帖子和多条评论
  • 评论:每条评论属于一个用户并关联一篇帖子
  • 分类:帖子按分类组织
  • 帖子:每篇帖子有一个分类并可获得多个赞
  • 点赞:每个赞关联一篇帖子

挑战需求: 查询过去7天在".NET"分类下评论最多的5位用户,每位用户返回:

  • 用户ID
  • 用户名
  • 评论数量(仅统计".NET"分类且7天内)
  • 该用户评论过的最热门3篇".NET"帖子(按点赞数排序,返回帖子ID和点赞数)

初始慢查询实现:

public List<ActiveUserDto> GetTopCommenters_Slow()
{
    var since = DateTime.UtcNow.AddDays(-7);

    // 1) 急切加载所有用户及其评论→帖子→分类→点赞
    var users = _dbContext.Users
        .Include(u => u.Comments)
            .ThenInclude(c => c.Post)
                .ThenInclude(p => p.Category)
        .Include(u => u.Comments)
            .ThenInclude(c => c.Post)
                .ThenInclude(p => p.Likes)
        .ToList();

    var result = new List<ActiveUserDto>();

    foreach (var u in users)
    {
        // 2) 筛选近期".NET"评论
        var comments = u.Comments
            .Where(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET")
            .ToList();

        var commentsCount = comments.Count;
        if (commentsCount == 0)
        {
            continue;
        }

        // 3) 按点赞数获取前3篇帖子
        var topPosts = comments
            .GroupBy(c => c.Post)
            .Select(g => new PostDto(
                g.Key.Id,
                _dbContext.Likes.Count(l => l.PostId == g.Key.Id)))
            .OrderByDescending(p => p.LikesCount)
            .Take(3)
            .ToList();

        // 4) 在这些热门帖子上获取最新2条评论
        var topPostIds = topPosts.Select(p => p.PostId).ToList();
        
        var recentTexts = _dbContext.Comments
            .Where(c =>
                c.UserId == u.Id &&
                c.CreatedAt >= since &&
                topPostIds.Contains(c.PostId))
            .OrderByDescending(c => c.CreatedAt)
            .Take(2)
            .Select(c => c.Text)
            .ToList();

        result.Add(new ActiveUserDto(
            u.Id, u.Username,
            commentsCount,
            topPosts,
            recentTexts));
    }

    // 5) 内存中筛选最终前5名
    return result
        .OrderByDescending(x => x.CommentsCount)
        .Take(5)
        .ToList();
}

此实现将所有数据急切加载到内存中,然后在客户端进行大量筛选、排序和聚合,导致从数据库传输大量数据,消耗大量内存,性能极慢。

在PostgreSQL中此查询耗时29-30秒。

优化1:预筛选用户

我们加载了所有用户及其评论,包括那些没有在".NET"分类下评论过的用户。添加筛选条件只返回实际在".NET"分类下评论过的用户:

public List<ActiveUserDto> GetTopCommenters_Optimization1_PreFilter()
{
    var since = DateTime.UtcNow.AddDays(-7);

    // 只筛选有近期.NET评论的用户
    var users = _dbContext.Users
        .Where(u => u.Comments
            .Any(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET"))
        .Include(u => u.Comments)
            .ThenInclude(c => c.Post)
                .ThenInclude(p => p.Category)
        .Include(u => u.Comments)
            .ThenInclude(c => c.Post)
                .ThenInclude(p => p.Likes)
        .ToList();

    // 其余代码与慢查询相同
}

优化后节省了1秒,数据量越大效果越明显。

优化2:限制前5名用户

在筛选用户后,进一步限制只返回评论最多的5位用户:

public List<ActiveUserDto> GetTopCommenters_Optimization2_LimitUsers()
{
    var since = DateTime.UtcNow.AddDays(-7);

    // 在数据库中计算评论数,然后取前5
    var users = _dbContext.Users
        .Where(u => u.Comments
            .Any(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET"))
        .OrderByDescending(u => u.Comments
            .Count(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET")
        )
        .Take(5)
        .Include(u => u.Comments)
            .ThenInclude(c => c.Post)
                .ThenInclude(p => p.Category)
        .Include(u => u.Comments)
            .ThenInclude(c => c.Post)
                .ThenInclude(p => p.Likes)
        .ToList();

    // 其余代码与慢查询相同
}

优化后从27秒降至17秒,性能显著提升。

优化3:减少JOIN数量

只连接必要的数据表:

public List<ActiveUserDto> GetTopCommenters_Optimization3_FilterComments()
{
    var since = DateTime.UtcNow.AddDays(-7);

    // 只包含符合筛选条件的评论
    var users = _dbContext.Users
        .Include(u => u.Comments.Where(c =>
            c.CreatedAt >= since &&
            c.Post.Category.Name == ".NET"))
        .ThenInclude(c => c.Post)
            .ThenInclude(p => p.Likes)
        .Where(u => u.Comments.Any())
        .OrderByDescending(u => u.Comments
            .Count(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET")
        )
        .Take(5)
        .ToList();

    // 其余处理逻辑
}

优化后降至10秒,内存消耗减少近半。

优化4:仅选择所需列

分两阶段获取数据:

  1. 获取前5名用户及其评论总数
  2. 循环中执行两个小查询:
    • 获取他们评论过的前3篇热门帖子
    • 获取这些帖子上的最新2条评论
public List<ActiveUserDto> GetTopCommenters_Optimization4_Projection()
{
    var since = DateTime.UtcNow.AddDays(-7);

    // 获取用户及其评论数
    var topUsers = _dbContext.Users
        .AsNoTracking() // 只读操作使用AsNoTracking
        .Where(u => u.Comments.Any(c =>
            c.CreatedAt >= since &&
            c.Post.Category.Name == ".NET"))
        .Select(u => new
        {
            u.Id,
            u.Username,
            CommentsCount = u.Comments.Count(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET")
        })
        .OrderByDescending(u => u.CommentsCount)
        .Take(5)
        .ToList();

    // 其余处理逻辑
}

尽管有N+1查询,但时间从10秒降至40毫秒,共发送11次简单数据库调用。

优化5:单查询投影

尝试将所有筛选、计数、分组、排序和分页放入单个LINQ语句:

public List<ActiveUserDto> GetTopCommenters_Optimization5_OneQuery()
{
    var since = DateTime.UtcNow.AddDays(-7);

    var projected = _dbContext.Users
        // 1) 筛选有.NET评论的用户
        .Where(u => u.Comments
            .Any(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET"))

        // 2) 一次性投影所有数据
        .Select(u => new ActiveUserDto(
            u.Id,
            u.Username,
            u.Comments.Count(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET"),
            u.Comments
                .Where(c =>
                    c.CreatedAt >= since &&
                    c.Post.Category.Name == ".NET")
                .GroupBy(c => c.Post)
                .Select(g => new PostDto(
                    g.Key.Id, 
                    g.Key.Likes.Count))
                .OrderByDescending(p => p.LikesCount)
                .Take(3)
                .ToList(),
            u.Comments
                .Where(c =>
                    c.CreatedAt >= since &&
                    c.Post.Category.Name == ".NET" &&
                    u.Comments
                        .Where(d =>
                            d.CreatedAt >= since &&
                            d.Post.Category.Name == ".NET")
                        .GroupBy(d => d.Post)
                        .OrderByDescending(g => g.Key.Likes.Count)
                        .Take(3)
                        .Select(g => g.Key.Id)
                        .Contains(c.PostId))
                .OrderByDescending(c => c.CreatedAt)
                .Take(2)
                .Select(c => c.Text)
                .ToList()
        ))
        .OrderByDescending(x => x.CommentsCount)
        .Take(5)
        .ToList();

    return projected;
}

相比11次小查询,单查询仅快3毫秒,但SQL语句复杂且存在笛卡尔积爆炸风险。

优化6:拆分查询(AsSplitQuery)

当投影深层对象图时,EF Core常生成包含多个JOIN的大型SQL,导致数据重复和性能下降。使用.AsSplitQuery()将大查询拆分为多个简单SQL语句:

public List<ActiveUserDto> GetTopCommenters_Optimization6_SplitQuery()
{
    var since = DateTime.UtcNow.AddDays(-7);

    var projected = _dbContext.Users
        // 与优化5相同代码
        .AsSplitQuery()
        .ToList();

    return projected;
}

结果出乎意料:耗时51毫秒,比11次查询的方案更慢。

优化7:三阶段查询

分三个阶段获取数据:

  1. 获取前5名用户及其评论数和热门帖子
  2. 获取这些用户按点赞数排序的热门帖子
  3. 获取每位用户在其热门帖子上的最新评论
public List<ActiveUserDto> GetTopCommenters_Optimization7_ThreePhaseOptimized()
{
    var since = DateTime.UtcNow.AddDays(-7);

    // 阶段1:获取前5名用户
    var topUsers = _dbContext.Users
        .Where(u => u.Comments.Any(c =>
            c.CreatedAt >= since &&
            c.Post.Category.Name == ".NET"))
        .Select(u => new
        {
            u.Id,
            u.Username,
            CommentsCount = u.Comments.Count(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET")
        })
        .OrderByDescending(x => x.CommentsCount)
        .Take(5)
        .ToList();

    // 阶段2:获取热门帖子
    var topPostsPerUser = _dbContext.Comments
        .Where(c =>
            userIds.Contains(c.UserId) &&
            c.CreatedAt >= since &&
            c.Post.Category.Name == ".NET")
        .GroupBy(c => new { c.UserId, c.PostId })
        .Select(g => new { g.Key.UserId, g.Key.PostId, LikesCount = g.First().Post.Likes.Count })
        .ToList()
        .GroupBy(x => x.UserId)
        .ToDictionary(
            g => g.Key,
            g => g.OrderByDescending(x => x.LikesCount)
                .Take(3)
                .Select(x => new PostDto(x.PostId, x.LikesCount))
                .ToList()
        );

    // 阶段3:获取最新评论
    var recentCommentsPerUser = _dbContext.Comments
        .Where(c =>
            userIds.Contains(c.UserId) &&
            c.CreatedAt >= since &&
            allTopPostIds.Contains(c.PostId))
        .OrderByDescending(c => c.CreatedAt)
        .Select(c => new { c.UserId, c.Text, c.CreatedAt })
        .ToList()
        .GroupBy(c => c.UserId)
        .ToDictionary(
            g => g.Key,
            g => g.OrderByDescending(c => c.CreatedAt)
                .Take(2)
                .Select(c => c.Text)
                .ToList()
        );

    // 合并最终结果
    return topUsers
        .Select(u => new ActiveUserDto(
            u.Id,
            u.Username,
            u.CommentsCount,
            topPostsPerUser.TryGetValue(u.Id, out var posts) ? posts : [],
            recentCommentsPerUser.TryGetValue(u.Id, out var comments) ? comments : []
        ))
        .ToList();
}

优化后降至31毫秒。

优化8:两阶段查询

分两个阶段:

  1. 获取前5名用户及其热门3篇.NET帖子和评论数
  2. 获取每位用户的最新2条评论
public List<ActiveUserDto> GetTopCommenters_Optimization8_TwoPhaseOptimized()
{
    var since = DateTime.UtcNow.AddDays(-7);

    // 阶段1:获取用户摘要
    var summaries = _dbContext.Users
        .Where(u => u.Comments
            .Any(c =>
                c.CreatedAt >= since &&
                c.Post.Category.Name == ".NET"))
        .Select(u => new
        {
            u.Id,
            u.Username,
            CommentsCount = u.Comments
                .Count(c =>
                    c.CreatedAt >= since &&
                    c.Post.Category.Name == ".NET"),
            TopPosts = u.Comments
                .Where(c =>
                    c.CreatedAt >= since &&
                    c.Post.Category.Name == ".NET")
                .GroupBy(c => c.PostId)
                .Select(g => new
                {
                    PostId = g.Key,
                    LikesCount = _dbContext.Likes
                        .Count(l => l.PostId == g.Key)
                })
                .OrderByDescending(p => p.LikesCount)
                .Take(3)
                .ToList()
        })
        .OrderByDescending(x => x.CommentsCount)
        .Take(5)
        .ToList();

    // 阶段2:获取最新评论
    var recentCommentsLookup = _dbContext.Comments
        .Where(c =>
            c.CreatedAt >= since &&
            c.Post.Category.Name == ".NET" &&
            userIds.Contains(c.UserId) &&
            postIds.Contains(c.PostId))
        .GroupBy(c => new { c.UserId, c.PostId })
        .Select(g => new
        {
            g.Key.UserId,
            g.Key.PostId,
            LatestTwo = g
                .OrderByDescending(c => c.CreatedAt)
                .Take(2)
                .Select(c => c.Text)
                .ToList()
        })
        .ToList()
        .ToLookup(x => x.UserId, x => x);

    // 组合最终DTO
    return summaries
        .Select(s => new ActiveUserDto(
            s.Id,
            s.Username,
            s.CommentsCount,
            s.TopPosts
                .Select(tp => new PostDto(tp.PostId, tp.LikesCount))
                .ToList(),
            recentCommentsLookup[s.Id]
                .SelectMany(x => x.LatestTwo)
                .Take(2)
                .ToList()
        ))
        .ToList();
}

优化后稳定在29-30毫秒,是目前最快的方案。

各优化方案对比

| 优化方案 | 执行时间 | 内存消耗 | |---------|---------|---------| | 原始方案 | 30秒 | - | | 优化1 | 29秒 | - | | 优化2 | 17秒 | - | | 优化3 | 10秒 | - | | 优化4 | 40毫秒 | - | | 优化5 | 38毫秒 | 104KB | | 优化6 | 51毫秒 | - | | 优化7 | 31毫秒 | - | | 优化8 | 30毫秒 | 228KB |

优化8最快(~30毫秒)但消耗~228KB内存;优化5("单查询")耗时~38毫秒(慢9毫秒)但仅分配~104KB,内存消耗不到优化8的一半。

实践中建议从优化5开始,它在速度、内存和复杂度间取得了最佳平衡。随着数据增长,可考虑切换到多阶段方案(优化7/8)。

本次EF Core查询优化挑战中,我们将查询从30秒优化到约30毫秒。关键收获:

  1. 数据筛选(优化1-3):只加载必要数据
  2. 两阶段投影(优化4,7,8):将大工作拆分为小查询
  3. 单查询投影(优化5):平衡简单性、速度和低内存
  4. 拆分查询(优化6):投影深层图时避免笛卡尔积爆炸

专业建议:考虑为查询中频繁访问的字段添加数据库索引以进一步提升性能。

记住黄金法则:先测量,再优化。通常你会发现查询已经足够快,无需优化。使用日志或性能分析器识别实际瓶颈,应用本文的针对性改进,并验证每次优化的效果。

相关留言评论
昵称:
邮箱:
阅读排行