将文件中的非连续块映射到连续内存地址

2024-04-11

我对使用内存映射IO的前景感兴趣,最好是 利用 boost::interprocess 中的设施实现跨平台 支持,将文件中的非连续系统页面大小块映射到 内存中连续的地址空间。

一个简化的具体场景:

我有许多“普通旧数据”结构,每个结构都有固定长度 (小于系统页面大小。)这些结构是串联的 进入一个(很长的)流,其中包含结构的类型和位置 由那些进行它们的结构的值决定 溪流。我的目标是最小化延迟并最大化吞吐量 要求较高的并发环境。

我可以通过将数据块内存映射来非常有效地读取这些数据 至少是系统页面大小的两倍......并建立一个新的 读取超出范围的结构后立即映射 倒数第二个系统页面边界。这允许交互的代码 使用简单的旧数据结构幸福地不知道这些 结构是内存映射的......并且,例如,可以比较两个 不同结构直接使用 memcmp() 无需关心 关于页面边界。

事情变得有趣的是更新这些数据 流......同时它们被(同时)读取。我的策略 like to use 的灵感来自于系统页面大小上的“Copy On Write” 粒度......本质上是编写“覆盖页面” - 允许一个 一个进程读取旧数据,另一个进程读取更新的数据。

虽然管理使用哪些覆盖页面以及何时使用并不一定 微不足道...这不是我主要关心的。我主要担心的是我可能 有一个跨越第 4 页和第 5 页的结构,然后更新 结构完全包含在第 5 页中...将新页面写入 位置 6...将第 5 页留作“垃圾收集” 确定不再可达。这意味着,如果我映射页面 4到位置M,我需要将页6映射到内存位置 M+page_size...以便能够可靠地处理以下结构 使用现有的(非内存映射感知)函数跨页边界。

我正在尝试制定最佳策略,但我受到以下因素的阻碍 我觉得文档不完整。本质上,我需要解耦 从内存映射到该地址的地址空间分配 空间。使用 mmap(),我知道我可以使用 MAP_FIXED - 如果我愿意的话 明确控制映射位置...但我不清楚我如何 应保留地址空间以便安全地执行此操作。我可以地图吗 /dev/zero 对于没有 MAP_FIXED 的两个页面,然后使用 MAP_FIXED 两次 将两个页面映射到显式 VM 地址处的分配空间中?如果 那么,我也应该调用 munmap() 三次吗?会不会泄露资源 和/或有任何其他不愉快的开销?为了让问题更加严重 复杂,我想在 Windows 上进行类似的行为...有什么办法吗 去做这个?如果我要妥协,是否有巧妙的解决方案 跨平台的野心?

--

感谢您的回答,Mahmoud...我已经阅读过,并且认为我已经理解了该代码...我已经在 Linux 下编译了它,它的行为正如您所建议的那样。

我主要关心的是第 62 行 - 使用 MAP_FIXED。它对 mmap 做出了一些假设,当我阅读我能找到的文档时,我无法确认这些假设。您将“更新”页面映射到与 mmap() 最初返回的地址空间相同的地址空间中 - 我认为这是“正确的” - 即不是恰好在 Linux 上工作的东西?我还需要假设它适用于文件映射和匿名映射的跨平台。

该示例无疑推动了我前进......记录了我最终需要的东西可能可以通过 Linux 上的 mmap() 实现 - 至少是这样。我真正想要的是一个指向文档的指针,该文档显示 MAP_FIXED 行将按照示例演示的方式工作......并且理想情况下,从 Linux/Unix 特定的 mmap() 到独立于平台的转换(Boost::interprocess ) 方法。


你的问题有点令人困惑。据我了解,这段代码将满足您的需要:

#define PAGESIZE 4096

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <errno.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>

struct StoredObject
{
    int IntVal;
    char StrVal[25];
};

int main(int argc, char **argv)
{
    int fd = open("mmapfile", O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600);
    //Set the file to the size of our data (2 pages)
    lseek(fd, PAGESIZE*2 - 1, SEEK_SET);
    write(fd, "", 1); //The final byte

    unsigned char *mapPtr = (unsigned char *) mmap(0, PAGESIZE * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

    struct StoredObject controlObject;
    controlObject.IntVal = 12;
    strcpy(controlObject.StrVal, "Mary had a little lamb.\n");

    struct StoredObject *mary1;
    mary1 = (struct StoredObject *)(mapPtr + PAGESIZE - 4); //Will fall on the boundary between first and second page
    memcpy(mary1, &controlObject, sizeof(StoredObject));

    printf("%d, %s", mary1->IntVal, mary1->StrVal);
    //Should print "12, Mary had a little lamb.\n"

    struct StoredObject *john1;
    john1 = mary1 + 1; //Comes immediately after mary1 in memory; will start and end in the second page
    memcpy(john1, &controlObject, sizeof(StoredObject));

    john1->IntVal = 42;
    strcpy(john1->StrVal, "John had a little lamb.\n");

    printf("%d, %s", john1->IntVal, john1->StrVal);
    //Should print "12, Mary had a little lamb.\n"

    //Make sure the data's on the disk, as this is the initial, "read-only" data
    msync(mapPtr, PAGESIZE * 2, MS_SYNC);

    //This is the inital data set, now in memory, loaded across two pages
    //At this point, someone could be reading from there. We don't know or care.
    //We want to modify john1, but don't want to write over the existing data
    //Easy as pie.

    //This is the shadow map. COW-like optimization will take place: 
    //we'll map the entire address space from the shared source, then overlap with a new map to modify
    //This is mapped anywhere, letting the system decide what address we'll be using for the new data pointer
    unsigned char *mapPtr2 = (unsigned char *) mmap(0, PAGESIZE * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

    //Map the second page on top of the first mapping; this is the one that we're modifying. It is *not* backed by disk
    unsigned char *temp = (unsigned char *) mmap(mapPtr2 + PAGESIZE, PAGESIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_ANON, 0, 0);
    if (temp == MAP_FAILED)
    {
        printf("Fixed map failed. %s", strerror(errno));
    }
    assert(temp == mapPtr2 + PAGESIZE);

    //Make a copy of the old data that will later be changed
    memcpy(mapPtr2 + PAGESIZE, mapPtr + PAGESIZE, PAGESIZE);

    //The two address spaces should still be identical until this point
    assert(memcmp(mapPtr, mapPtr2, PAGESIZE * 2) == 0);

    //We can now make our changes to the second page as needed
    struct StoredObject *mary2 = (struct StoredObject *)(((unsigned char *)mary1 - mapPtr) + mapPtr2);
    struct StoredObject *john2 = (struct StoredObject *)(((unsigned char *)john1 - mapPtr) + mapPtr2);

    john2->IntVal = 52;
    strcpy(john2->StrVal, "Mike had a little lamb.\n");

    //Test that everything worked OK
    assert(memcmp(mary1, mary2, sizeof(struct StoredObject)) == 0);
    printf("%d, %s", john2->IntVal, john2->StrVal);
    //Should print "52, Mike had a little lamb.\n"

    //Now assume our garbage collection routine has detected that no one is using the original copy of the data
    munmap(mapPtr, PAGESIZE * 2);

    mapPtr = mapPtr2;

    //Now we're done with all our work and want to completely clean up
    munmap(mapPtr2, PAGESIZE * 2);

    close(fd);

    return 0;
}

我修改后的答案应该可以解决您的安全问题。 仅使用MAP_FIXED在第二个mmap打电话(就像我上面那样)。最酷的事情是MAP_FIXED是它可以让你覆盖现有的mmap地址部分。它将卸载您重叠的范围并将其替换为新的映射内容:

 MAP_FIXED
              [...] If the memory
              region specified by addr and len overlaps pages of any existing
              mapping(s), then the overlapped part of the existing mapping(s) will be
              discarded. [...]

这样,您就可以让操作系统为您找到一个数百兆的连续内存块(永远不要调用MAP_FIXED您不确定的地址不可用)。然后你打电话MAP_FIXED在现在映射的巨大空间的一部分上,其中包含您将要修改的数据。多田。


在 Windows 上,类似这样的东西应该可以工作(我现在在 Mac 上,所以未经测试):

int main(int argc, char **argv)
{
    HANDLE hFile = CreateFile(L"mmapfile", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    //Set the file to the size of our data (2 pages)
    SetFilePointer(hFile, PAGESIZE*2 - 1, 0, FILE_BEGIN);
    DWORD bytesWritten = -1;
    WriteFile(hFile, "", 1, &bytesWritten, NULL);

    HANDLE hMap = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, PAGESIZE * 2, NULL);
    unsigned char *mapPtr = (unsigned char *) MapViewOfFile(hMap, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE * 2);

    struct StoredObject controlObject;
    controlObject.IntVal = 12;
    strcpy(controlObject.StrVal, "Mary had a little lamb.\n");

    struct StoredObject *mary1;
    mary1 = (struct StoredObject *)(mapPtr + PAGESIZE - 4); //Will fall on the boundary between first and second page
    memcpy(mary1, &controlObject, sizeof(StoredObject));

    printf("%d, %s", mary1->IntVal, mary1->StrVal);
    //Should print "12, Mary had a little lamb.\n"

    struct StoredObject *john1;
    john1 = mary1 + 1; //Comes immediately after mary1 in memory; will start and end in the second page
    memcpy(john1, &controlObject, sizeof(StoredObject));

    john1->IntVal = 42;
    strcpy(john1->StrVal, "John had a little lamb.\n");

    printf("%d, %s", john1->IntVal, john1->StrVal);
    //Should print "12, Mary had a little lamb.\n"

    //Make sure the data's on the disk, as this is the initial, "read-only" data
    //msync(mapPtr, PAGESIZE * 2, MS_SYNC);

    //This is the inital data set, now in memory, loaded across two pages
    //At this point, someone could be reading from there. We don't know or care.
    //We want to modify john1, but don't want to write over the existing data
    //Easy as pie.

    //This is the shadow map. COW-like optimization will take place: 
    //we'll map the entire address space from the shared source, then overlap with a new map to modify
    //This is mapped anywhere, letting the system decide what address we'll be using for the new data pointer
    unsigned char *reservedMem = (unsigned char *) VirtualAlloc(NULL, PAGESIZE * 2, MEM_RESERVE, PAGE_READWRITE);
    HANDLE hMap2 = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, PAGESIZE, NULL);
    unsigned char *mapPtr2 = (unsigned char *) MapViewOfFileEx(hMap2, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE, reservedMem);

    //Map the second page on top of the first mapping; this is the one that we're modifying. It is *not* backed by disk
    unsigned char *temp = (unsigned char *) MapViewOfFileEx(hMap2, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE, reservedMem + PAGESIZE);
    if (temp == NULL)
    {
        printf("Fixed map failed. 0x%x\n", GetLastError());
        return -1;
    }
    assert(temp == mapPtr2 + PAGESIZE);

    //Make a copy of the old data that will later be changed
    memcpy(mapPtr2 + PAGESIZE, mapPtr + PAGESIZE, PAGESIZE);

    //The two address spaces should still be identical until this point
    assert(memcmp(mapPtr, mapPtr2, PAGESIZE * 2) == 0);

    //We can now make our changes to the second page as needed
    struct StoredObject *mary2 = (struct StoredObject *)(((unsigned char *)mary1 - mapPtr) + mapPtr2);
    struct StoredObject *john2 = (struct StoredObject *)(((unsigned char *)john1 - mapPtr) + mapPtr2);

    john2->IntVal = 52;
    strcpy(john2->StrVal, "Mike had a little lamb.\n");

    //Test that everything worked OK
    assert(memcmp(mary1, mary2, sizeof(struct StoredObject)) == 0);
    printf("%d, %s", john2->IntVal, john2->StrVal);
    //Should print "52, Mike had a little lamb.\n"

    //Now assume our garbage collection routine has detected that no one is using the original copy of the data
    //munmap(mapPtr, PAGESIZE * 2);

    mapPtr = mapPtr2;

    //Now we're done with all our work and want to completely clean up
    //munmap(mapPtr2, PAGESIZE * 2);

    //close(fd);

    return 0;
}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

将文件中的非连续块映射到连续内存地址 的相关文章

随机推荐