我使用以下代码下载 s3 存储桶中的所有文件:
def main(bucket_name, destination_dir):
bucket = boto3.resource('s3').Bucket(bucket_name)
for obj in bucket.objects.all():
if obj.key.endswith('/'):
continue
destination = '%s/%s' % (bucket_name, obj.key)
if not os.path.exists(destination):
os.makedirs(os.path.dirname(destination), exist_ok=True)
bucket.download_file(obj.key, destination)
如果可能的话,我想知道如何使其异步。
提前谢谢你。
您可以使用generate_presigned_url
s3 客户端的方法来获取带有 AWS 凭证的 URL(请参阅docs https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html),然后通过异步 HTTP 客户端发送下载文件的请求(aiohttp https://docs.aiohttp.org/en/stable/client_reference.html例如)
aiohttp 应用 URL 规范化,如果密钥包含空格或非 ASCII 字符,这可能会导致问题。使用URL(..., encoded=True)
将解决这个问题。
import boto3
import asyncio
from aiohttp import client
from yarl import URL
bucket = 'some-bucket-name'
s3_client = boto3.client('s3')
s3_objs = s3_client.list_objects(Bucket=bucket)['Contents']
async def download_s3_obj(key: str, aiohttp_session: client.ClientSession):
request_url = s3_client.generate_presigned_url('get_object', {
'Bucket': bucket,
'Key': key
})
async with aiohttp_session.get(URL(request_url, encoded=True)) as response:
file_path = 'some-local-folder-name/' + key.split('/')[-1]
with open(file_path, 'wb') as file:
file.write(await response.read())
async def get_tasks():
session = client.ClientSession()
return [download_s3_obj(f['Key'], session) for f in s3_objs], session
loop = asyncio.get_event_loop()
tasks, session = loop.run_until_complete(get_tasks())
loop.run_until_complete(asyncio.gather(*tasks))
loop.run_until_complete(session.close())
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)