我有一个 zip 文件到达 s3 存储桶的要求,我需要使用 python 编写一个 lambda 来读取 zip 文件,执行一些验证并在另一个 S3 存储桶上解压缩。
Zip 文件包含以下内容:
a.csv b.csv c.csv trigger_file.txt
trigger_file.txt -- 包含 zip 中的文件名称和记录计数(例如: a.csv:120 、 b.csv:10 、 c.csv:50 )
因此,如果将解压缩传递到 s3 存储桶,使用 lambda 我需要读取触发器文件,检查 zip 文件夹中的文件数量是否等于触发器文件中提到的文件数量。
我准备了以下代码:
def write_to_s3(config_dict):
inp_bucket = config_dict["inp_bucket"]
inp_key = config_dict["inp_key"]
out_bucket = config_dict["out_bucket"]
des_key = config_dict["des_key"]
processed_key = config_dict["processed_key"]
obj = S3_CLIENT.get_object(Bucket=inp_bucket, Key=inp_key)
putObjects = []
with io.BytesIO(obj["Body"].read()) as tf:
# rewind the file
tf.seek(0)
# Read the file as a zipfile perform transformations and process the members
with zipfile.ZipFile(tf, mode='r') as zipf:
for file in zipf.infolist():
fileName = file.filename
print("file name before while loop :",fileName)
try:
found = False
while not found :
if fileName == "Trigger_file.txt" :
with zipf.open(fileName , 'r') as thefile:
my_list = [i.decode('utf8').split(' ') for i in thefile]
my_list = str(my_list)[1:-1]
print("my_list :",my_list)
print("fileName :",fileName)
found = True
break
thefile.close()
else:
print("Trigger file not found ,try again")
except Exception as exp_handler:
raise exp_handler
if 'csv' in fileName :
try:
if fileName in my_list:
print("Validation Success , all files in Trigger file are present procced for extraction")
else:
print("Validation Failed")
except Exception as exp_handler:
raise exp_handler
# *****FUNCTION TO UNZIP ********
def lambda_handler(event, context):
try:
inp_bucket = event['Records'][0]['s3']['bucket']['name']
inp_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
config_dict = build_conf_obj(os.environ['config_bucket'],os.environ['config_file'], os.environ['param_name'])
write_to_s3(config_dict)
except Exception as exp_handler:
print("ERROR")
一切都很顺利,我面临的唯一问题是验证部分,我认为 while 循环是错误的,因为它正在进入无限循环。
期待:
如果找到,则在 zip 文件夹中搜索trigger_file.txt,然后打破循环进行验证并将其解压缩到 s3 文件夹。如果没有找到继续搜索直到字典结束。
错误输出(超时):
Response:
{
"errorMessage": "2020-06-16T20:09:06.168Z 39253b98-db87-4e65-b288-b585d268ac5f Task timed out after 60.06 seconds"
}
Request ID:
"39253b98-db87-4e65-b288-b585d268ac5f"
Function Logs:
again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,trEND RequestId: 39253b98-db87-4e65-b288-b585d268ac5f
REPORT RequestId: 39253b98-db87-4e65-b288-b585d268ac5f Duration: 60060.06 ms Billed Duration: 60000 ms Memory Size: 3008 MB Max Memory Used: 83 MB Init Duration: 389.65 ms
2020-06-16T20:09:06.168Z 39253