在 Google App Engine 中,当我将文件写入 blobstore 时,如何减少内存消耗而不超过软内存限制?

2024-04-26

我正在使用 blobstore 来备份和恢复 csv 格式的实体。这个过程对于我所有的小型模型来说都运行良好。然而,一旦我开始处理具有超过 2K 实体的模型,我就超出了软内存限制。我一次只获取 50 个实体,然后将结果写入 blobstore,因此我不清楚为什么我的内存使用量会增加。我可以通过增加下面传递的“限制”值来可靠地使该方法失败,这会导致该方法运行更长的时间以导出更多实体。

  1. 关于如何优化此过程以减少内存消耗有什么建议吗?

  2. 此外,生成的文件大小仅为

简化示例:

file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
    writer = csv.DictWriter(f, fieldnames=properties)
    for entity in models.Player.all():
      row = backup.get_dict_for_entity(entity)
      writer.writerow(row)

产生错误: 总共处理 7 个请求后,超出了软专用内存限制,达到 150.957 MB

简化示例2:

问题似乎出在 python 2.5 中使用文件和 with 语句。考虑到 csv 内容,我只需尝试将 4000 行文本文件写入 blobstore 就可以重现几乎相同的错误。

from __future__ import with_statement
from google.appengine.api import files
from google.appengine.ext.blobstore import blobstore
file_name = files.blobstore.create(mime_type='application/octet-stream')   
myBuffer = StringIO.StringIO()

#Put 4000 lines of text in myBuffer

with files.open(file_name, 'a') as f:
    for line in myBuffer.getvalue().splitlies():
        f.write(line)

files.finalize(file_name)  
blob_key = files.blobstore.get_blob_key(file_name)

产生错误: 总共处理 24 个请求后,超出了软专用内存限制,达到 154.977 MB

原来的:

def backup_model_to_blobstore(model, limit=None, batch_size=None):
    file_name = files.blobstore.create(mime_type='application/octet-stream')
    # Open the file and write to it
    with files.open(file_name, 'a') as f:
      #Get the fieldnames for the csv file.
      query = model.all().fetch(1)
      entity = query[0]
      properties = entity.__class__.properties()
      #Add ID as a property
      properties['ID'] = entity.key().id()

      #For debugging rather than try and catch
      if True:
        writer = csv.DictWriter(f, fieldnames=properties)
        #Write out a header row
        headers = dict( (n,n) for n in properties )
        writer.writerow(headers)

        numBatches = int(limit/batch_size)
        if numBatches == 0:
            numBatches = 1

        for x in range(numBatches):
          logging.info("************** querying with offset %s and limit %s", x*batch_size, batch_size)
          query = model.all().fetch(limit=batch_size, offset=x*batch_size)
          for entity in query:
            #This just returns a small dictionary with the key-value pairs
            row = get_dict_for_entity(entity)
            #write out a row for each entity.
            writer.writerow(row)

    # Finalize the file. Do this before attempting to read it.
    files.finalize(file_name)

    blob_key = files.blobstore.get_blob_key(file_name)
    return blob_key

错误在日志中看起来像这样

......
2012-02-02 21:59:19.063
************** querying with offset 2050 and limit 50
I 2012-02-02 21:59:20.076
************** querying with offset 2100 and limit 50
I 2012-02-02 21:59:20.781
************** querying with offset 2150 and limit 50
I 2012-02-02 21:59:21.508
Exception for: Chris (202.161.57.167)

err:
Traceback (most recent call last):
  .....
    blob_key = backup_model_to_blobstore(model, limit=limit, batch_size=batch_size)
  File "/base/data/home/apps/singpath/163.356548765202135434/singpath/backup.py", line 125, in backup_model_to_blobstore
    writer.writerow(row)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
    self.close()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
    self._make_rpc_call_with_retry('Close', request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
    _raise_app_error(e)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
    raise FileNotOpenedError()
FileNotOpenedError

C 2012-02-02 21:59:23.009
Exceeded soft private memory limit with 149.426 MB after servicing 14 requests total

您最好不要自己进行批处理,而只是迭代查询。迭代器将选择一个足够的批量大小(可能是 20):

q = model.all()
for entity in q:
    row = get_dict_for_entity(entity)
    writer.writerow(row)

这可以避免以不断增加的偏移量重新运行查询,这种方式很慢并且会导致数据存储中出现二次行为。

关于内存使用的一个经常被忽视的事实是,与实体的序列化形式相比,实体的内存中表示可以使用 30-50 倍的 RAM;例如磁盘上 3KB 的实体可能会使用 RAM 中的 100KB。 (确切的膨胀系数取决于许多因素;如果您有大量具有长名称和小值的属性,情况会更糟,对于具有长名称的重复属性更糟糕。)

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

在 Google App Engine 中,当我将文件写入 blobstore 时,如何减少内存消耗而不超过软内存限制? 的相关文章

随机推荐