std::mutex 性能与 win32 CRITICAL_SECTION 相比

2024-01-02

的表现如何std::mutex相比CRITICAL_SECTION?是同等水平吗?

我需要轻量级同步对象(不需要是进程间对象)是否有任何接近的STL类CRITICAL_SECTION以外std::mutex ?


请参阅我在答案末尾的更新,自 Visual Studio 2015 以来情况发生了巨大变化。原始答案如下。

我做了一个非常简单的测试,根据我的测量std::mutex大约慢 50-70 倍CRITICAL_SECTION.

std::mutex:       18140574us
CRITICAL_SECTION: 296874us

编辑:经过更多测试后发现它取决于线程数(拥塞)和 CPU 核心数。一般来说,std::mutex速度慢一些,但是慢多少,要看使用情况。以下是更新的测试结果(在配备 Core i5-4258U 的 MacBook Pro、Windows 10、Bootcamp 上测试):

Iterations: 1000000
Thread count: 1
std::mutex:       78132us
CRITICAL_SECTION: 31252us
Thread count: 2
std::mutex:       687538us
CRITICAL_SECTION: 140648us
Thread count: 4
std::mutex:       1031277us
CRITICAL_SECTION: 703180us
Thread count: 8
std::mutex:       86779418us
CRITICAL_SECTION: 1634123us
Thread count: 16
std::mutex:       172916124us
CRITICAL_SECTION: 3390895us

以下是产生此输出的代码。使用Visual Studio 2012编译,默认项目设置,Win32发布配置。请注意,这个测试可能并不完全正确,但它让我在从使用代码切换代码之前三思而后行CRITICAL_SECTION to std::mutex.

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc( int i )
{
    if ( i % 2 == 0 )
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

void threadFuncCritSec() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        EnterCriticalSection( &g_critSec );
        sharedFunc(i);
        LeaveCriticalSection( &g_critSec );
    }
}

void threadFuncMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n\r";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncCritSec ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n\r";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection( &g_critSec );

    std::cout << "Iterations: " << g_cRepeatCount << "\n\r";

    for (int i = 1; i <= g_cThreadCount; i = i*2) {
        std::cout << "Thread count: " << i << "\n\r";
        testRound(i);
        Sleep(1000);
    }

    DeleteCriticalSection( &g_critSec );

    // Added 10/27/2017 to try to prevent the compiler to completely
    // optimize out the code around g_shmem if it wouldn't be used anywhere.
    std::cout << "Shared variable value: " << g_shmem << std::endl;
    getchar();
    return 0;
}

更新 10/27/2017 (1): 一些答案表明这不是一个现实的测试或不代表“现实世界”的场景。确实如此,这个测试试图衡量overhead of the std::mutex,它并不是试图证明对于 99% 的应用程序来说差异可以忽略不计。

更新日期 10/27/2017 (2): 看来情况已经发生了有利于std::mutex自 Visual Studio 2015 (VC140) 起。我使用 VS2017 IDE,与上面完全相同的代码,x64 发布配置,禁用优化,我只是为每个测试切换了“平台工具集”。结果非常令人惊讶,我真的很好奇VC140中挂了什么。

更新 02/25/2020 (3): 使用Visual Studio 2019(Toolset v142)重新运行测试,情况仍然相同:std::mutex比速度快两到三倍CRITICAL_SECTION.

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

std::mutex 性能与 win32 CRITICAL_SECTION 相比 的相关文章

随机推荐