使用 LRU 缓存提升您的 Python 代码

2023-10-23

LRU 缓存或“最近最少使用”缓存是一种缓存，当缓存达到其大小限制时，最近最少使用的条目将被丢弃。

关键思想是通过重用以前的结果来加快对相同数据的后续请求。这种技术称为记忆化。

在本教程中，我们将深入研究 LRU 缓存的概念，探索其底层机制，以及如何使用 Python 来实现它functools.lru_cache装饰器，甚至如何为大型应用程序创建基于磁盘的 LRU 缓存。

目录 hide

1 functools.lru_cache 装饰器
2 自定义最大尺寸
3 从头开始实现 LRU 缓存
4 插入/检索逻辑
5 弹出逻辑
6 LRU缓存的持久化（基于磁盘的LRU缓存）
7 线程安全
8 绩效分析
9 Profiling and Monitoring LRU Cache
- 9.1 可视化缓存统计信息
10 LRU 缓存的实际用例
11 LRU 缓存的限制和缺点
12 进一步阅读

functools.lru_cache 装饰器

Python提供了内置的LRU缓存装饰器functools.lru_cache in the functools模块，提供记忆功能。让我们将其应用到斐波那契功能：


import functools

@functools.lru_cache(maxsize=128)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)
print(fib(10))

Output:

The @functools.lru_cache(maxsize=128)是一个装饰器，它包装我们的函数，使其能够记住或缓存函数调用的结果。

The maxsize参数用于指定缓存的最大大小。

自定义最大尺寸

您可以使用maxsize的参数functools.lru_cache装饰器允许您自定义缓存中的最大条目数，以防您想增加缓存的大小：


@functools.lru_cache(maxsize=1024)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)
print(fib(100))

Output:


354224848179261915075

在这里，我们增加了maxsize至 1024。

这允许fib函数来记住更多以前的函数调用。明智地使用此参数。较大的缓存大小将使用更多的内存。

If maxsize被设定为None，缓存可以无限增长。

从头开始实现 LRU 缓存

我们可以使用Python字典和双向链表的组合来手动实现LRU缓存。

字典将充当缓存，而双向链表将跟踪元素的顺序。
让我们首先定义一个Node我们的双向链表的类：


class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

接下来，让我们定义 LRU 缓存。这LRUCache类将初始化缓存及其大小。

The get and put方法将分别处理缓存中项目的检索和插入。


class LRUCache:
    def __init__(self, capacity: int):
        self.cache = dict()
        # dummy nodes
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head
        self.capacity = capacity

    def get(self, key: int) -> int:
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add(node)
            return node.value
        return -1

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self._remove(self.cache[key])
        node = Node(key, value)
        self._add(node)
        self.cache[key] = node
        if len(self.cache) > self.capacity:
            node = self.head.next
            self._remove(node)
            del self.cache[node.key]

    def _remove(self, node):
        p = node.prev
        n = node.next
        p.next = n
        n.prev = p

    def _add(self, node):
        p = self.tail.prev
        p.next = node
        self.tail.prev = node
        node.prev = p
        node.next = self.tail

在这里，我们实现了 LRU 缓存，其中缓存中的每个项目都是双向链表中的一个节点。这_remove(node)方法从列表中删除一个节点，并且_add(node)方法将一个节点添加到列表的末尾。

当缓存已满（超出容量）时，最近最少使用的项目（列表的头部）将被删除。

插入/检索逻辑

在 LRU 缓存中，项目以键值对的形式插入到缓存中，类似于字典。每次插入时，缓存都会检查它是否已满。

如果缓存已满，它会在插入新项目之前删除最近最少使用的项目。这个插入操作的复杂度是O(1)。
对于数据检索，当访问特定键时，缓存会将访问的键值对移动到缓存的末尾，因为它是最近使用的项。

检索操作也是 O(1)。
让我们使用手动实现的 LRU 缓存来说明这一点：


# Initialize our cache with the capacity of 2
lru_cache = LRUCache(2)

# Insert some items
lru_cache.put(1, 'a')  # cache is {1='a'}
lru_cache.put(2, 'b')  # cache is {1='a', 2='b'}
print(lru_cache.get(1))  # return 'a'

# This will make the key 1 as recently used so cache will be {2='b', 1='a'}
lru_cache.put(3, 'c')  # Removes key 2 and insert a new entry. Now cache is {1='a', 3='c'}
print(lru_cache.get(2))  # returns -1 (not found)

Output:


a
-1

正如我们所看到的，缓存有效地存储和检索数据，同时保持最近最少使用的顺序。

弹出逻辑

弹出逻辑决定当缓存已满时要删除哪个项目。在 LRU 缓存中，这将是最近最少使用的项目。

此策略的权衡是，它需要跟踪最近使用的内容，但它的好处是确保只要频繁访问某个项目，它就会保留在缓存中。
In our LRUCache实施，将_remove(node)方法处理弹出逻辑。

当超出缓存容量时，它会删除头节点旁边的节点（即最近最少使用的节点）。我们来说明一下弹出逻辑：


# Continuing from our previous example
lru_cache = LRUCache(2)

# Insert some items
lru_cache.put(1, 'a')  # cache is {1='a'}
lru_cache.put(2, 'b')  # cache is {1='a', 2='b'}
lru_cache.put(3, 'c')  # Removes key 1 (least recently used) and insert a new entry. Now cache is {2='b', 3='c'}

print(lru_cache.get(1))  # returns -1 (not found as it is popped out)
print(lru_cache.get(2))  # return 'b'

Output:


-1
b

如图所示，当缓存达到最大容量时，它会弹出最近最少使用的项，即带有 key 的项1.

当我们尝试获取键的值时1，它返回了-1表示在缓存中未找到该项目。

LRU缓存的持久化（基于磁盘的LRU缓存）

通常，LRU 缓存中存储的数据位于内存中，当程序完成执行.

您可以使用Python的diskcache模块将 LRU 缓存保存在磁盘上。这是一个基本示例：


from diskcache import Cache
cache = Cache('my_cache_directory')

@cache.memoize()
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)
print(fib(10))

Output:

在此示例中，我们使用diskcache.Cache类来创建基于磁盘的持久缓存。

The @cache.memoize()装饰器的工作方式就像functools.lru_cache，但它将缓存数据存储在磁盘上，允许其在程序运行之间保留。

线程安全

The functools.lru_cache装饰器是线程安全的。这意味着它可以在多线程程序中使用，而不需要锁或其他同步方法。


import functools
import threading

@functools.lru_cache(maxsize=128)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

def print_fib(n):
    print(f"Thread {n}: {fib(20)}")

threads = []
for i in range(5):
    t = threading.Thread(target=print_fib, args=(i,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

Output:


Thread 0: 6765
Thread 1: 6765
Thread 2: 6765
Thread 3: 6765
Thread 4: 6765

在上面的例子中，多个线程可以安全地调用memoizedfib同时发挥作用。

每个线程打印第 20 个斐波那契数，使用我们的方法计算fib功能。

绩效分析

我们使用Python的time模块来比较一下执行时间fib有和没有 LRU 缓存的函数：


import time
import functools

def fib_no_cache(n):
    if n < 2:
        return n
    return fib_no_cache(n-1) + fib_no_cache(n-2)

@functools.lru_cache(maxsize=128)
def fib_with_cache(n):
    if n < 2:
        return n
    return fib_with_cache(n-1) + fib_with_cache(n-2)

# Calculate time taken by fib_no_cache
start = time.time()
fib_no_cache(30)
end = time.time()
print(f"Without cache: {end - start} seconds")

# Calculate time taken by fib_with_cache
start = time.time()
fib_with_cache(30)
end = time.time()
print(f"With cache: {end - start} seconds")

输出（可能因系统而异）：


Without cache: 0.6722159385681152 seconds
With cache: 5.0067901611328125e-05 seconds

如您所见，fib功能与LRU 缓存比没有缓存要快得多.

这清楚地展示了使用 LRU 缓存的性能优势。

分析和监控 LRU 缓存

Python 的内置cProfile模块可用于测量缓存命中/未命中率及其对性能的影响。

此外，functools.lru_cache装饰器提供了一些有用的统计信息，我们可以使用它们来监控缓存的性能。
下面是一个小例子来说明这一点：


import cProfile
from functools import lru_cache
import time

# Implement a slow Fibonacci function with a small delay
@lru_cache(maxsize=None)
def slow_fib(n):
    time.sleep(0.01)  # a delay of 0.01 seconds
    if n < 2:
        return n
    return slow_fib(n-1) + slow_fib(n-2)

# Define a function to call slow_fib() for a range of numbers
def test_slow_fib(n):
    for i in range(n+1):
        slow_fib(i)

# Use cProfile to profile the test_slow_fib() function
profiler = cProfile.Profile()
profiler.runcall(test_slow_fib, 50)

# Print the profiling results
profiler.print_stats()
print("\nCache Info:", slow_fib.cache_info())

Output:


         104 function calls in 0.796 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.796    0.796 lru_cache.py:14(test_slow_fib)
       51    0.002    0.000    0.795    0.016 lur_cache.py:6(slow_fib)
       51    0.793    0.016    0.793    0.016 {built-in method time.sleep}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Cache Info: CacheInfo(hits=98, misses=51, maxsize=None, currsize=51)

The time.sleep我们用来引入延迟的函数也被调用了 51 次，几乎占据了总时间。

缓存信息告诉我们有 98 次缓存命中和 51 次缓存未命中。

可视化缓存统计信息

我们可以使用以下命令可视化缓存统计信息matplotlib 库了解缓存命中率如何增长。
我们将增强我们的功能以在每次递归调用时记录缓存统计信息：


from functools import lru_cache
import matplotlib.pyplot as plt

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

# Run the function for a range of numbers and record the number of cache hits and misses after each call
cache_hits = []
cache_misses = []
for i in range(51):
    fib(i)
    info = fib.cache_info()
    cache_hits.append(info.hits)
    cache_misses.append(info.misses)

# Plot the cache hits and misses
plt.figure(figsize=(10,6))
plt.plot(cache_hits, label='Cache Hits')
plt.plot(cache_misses, label='Cache Misses')
plt.xlabel('Fibonacci Input')
plt.ylabel('Count')
plt.legend()
plt.title('LRU Cache Hits and Misses for Fibonacci Numbers')
plt.show()

Output:

在此代码中，我们在每次递归调用时记录cache_info()fib函数，将结果存储在cache_hits and cache_misses list.

然后，我们绘制这些调用过程中缓存命中和未命中的数量。
该图将显示每次递归调用的缓存命中和未命中数fib功能。

正如您所看到的，缓存命中的数量随着时间的推移而增加，而缓存未命中的数量则趋于平稳，这表明我们的缓存正在有效地减少冗余计算的数量。

LRU 缓存的实际用例

LRU 缓存可以优化许多实际应用程序。

这是一个简单的示例，说明您可以如何使用lru_cache对于数据库查询优化：


import functools
import sqlite3

@functools.lru_cache(maxsize=1024)
def get_employee_details(db_path, emp_id):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM employees WHERE id=?", (emp_id,))
    details = cursor.fetchall()
    conn.close()
    return details

# employee.db is our database and it has a table 'employees'
details = get_employee_details('employee.db', 123)

在上面的示例中，当您获取员工的详细信息时，它会缓存结果。

下次您获取同一员工 ID 的详细信息时，它将从缓存中返回结果，而不是再次查询数据库。

LRU 缓存的限制和缺点

虽然 LRU 缓存为缓存管理提供了一种简单而有效的策略，但它并非没有局限性：

空间使用：如果不限制，LRU 缓存将随着每个唯一请求无限增长。为缓存设置适当的大小以确保高效的内存使用至关重要。
缓存踩踏：当多个线程尝试从刚刚弹出的缓存中读取值时，可能会发生缓存踩踏。这可能会导致高负载并显着降低性能。
某些数据访问模式无效：如果数据访问模式使得每个请求都是唯一的或频繁更新，则 LRU 缓存将无效，因为它将无法为来自缓存的任何请求提供服务。

LRU 缓存的替代方案：

最不常用 (LFU)：LFU 缓存首先弹出最不常用的项目。如果经常访问的项目也经常更新，这可能比 LRU 缓存更有效。
随机替换（RR）：一个简单的策略，随机选择一个候选项目进行驱逐。如果访问模式是完全随机的，这可能是有效的。
分段LRU（SLRU）：这是 LRU 的扩展，它有两个 LRU 列表（一个试用段和一个受保护段），提供比普通 LRU 缓存更好的命中率。

进一步阅读

https://docs.python.org/3/library/functools.html

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python