为了尝试将气流日志记录到 localstack s3 存储桶,对于本地和 kubernetes 开发环境,我遵循用于记录到 s3 的气流文档 https://airflow.apache.org/docs/1.10.1/howto/write-logs.html。提供一些背景信息,本地堆栈 https://github.com/localstack/localstack是本地 AWS 云堆栈,其中包含本地运行的 s3 等 AWS 服务。
我将以下环境变量添加到我的气流容器中与其他堆栈溢出帖子类似 https://stackoverflow.com/questions/44780736/setting-up-s3-for-logs-in-airflow#answer-48194903尝试登录到我的本地 s3 存储桶。这是我添加的内容docker-compose.yaml
对于所有气流容器:
- AIRFLOW__CORE__REMOTE_LOGGING=True
- AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://local-airflow-logs
- AIRFLOW__CORE__REMOTE_LOG_CONN_ID=MyS3Conn
- AIRFLOW__CORE__ENCRYPT_S3_LOGS=False
我还将我的 localstack s3 信用添加到airflow.cfg
[MyS3Conn]
aws_access_key_id = foo
aws_secret_access_key = bar
aws_default_region = us-east-1
host = http://localstack:4572 # s3 port. not sure if this is right place for it
此外,我还安装了 apache-airflow[hooks] 和 apache-airflow[s3],尽管根据文档 https://airflow.apache.org/docs/1.10.1/howto/write-logs.html.
我已按照以下步骤操作之前的堆栈溢出帖子 https://stackoverflow.com/questions/48817258/setting-up-s3-logging-in-airflow尝试验证 S3Hook 是否可以写入我的 localstack s3 实例:
from airflow.hooks import S3Hook
s3 = S3Hook(aws_conn_id='MyS3Conn')
s3.load_string('test','test',bucket_name='local-airflow-logs')
但我得到botocore.exceptions.NoCredentialsError: Unable to locate credentials
.
After adding credentials to airflow console under /admin/connection/edit
as depicted:
this is the new exception, botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
is returned. Other people have encountered this same issue https://github.com/localstack/localstack/issues/1568 and it may have been related to networking.
无论如何,需要一种编程设置,而不是手动设置。
我能够使用独立的 Python 脚本访问存储桶(使用 boto 显式输入 AWS 凭证),但它需要作为气流的一部分工作。
是否有正确的方法来设置主机/端口/凭据S3Hook
通过增加MyS3Conn
to airflow.cfg
?
基于气流 s3 钩子源代码 https://github.com/apache/airflow/blob/58c3542ed25061320ce61dbe0adf451a44c738dd/airflow/providers/amazon/aws/hooks/s3.py,似乎气流可能尚不支持自定义 s3 URL。但根据气流aws_hook 源代码 https://github.com/apache/airflow/blob/58c3542ed25061320ce61dbe0adf451a44c738dd/airflow/providers/amazon/aws/hooks/aws_hook.py(父)似乎应该可以设置包括端口在内的端点_url,并且应该从airflow.cfg
.
我可以单独使用 boto 检查并写入 localstack 中的 s3 存储桶。还,curl http://localstack:4572/local-mochi-airflow-logs
从气流容器返回桶中的内容。和aws --endpoint-url=http://localhost:4572 s3 ls
回报Could not connect to the endpoint URL: "http://localhost:4572/"
.
可能还需要哪些其他步骤才能通过自动设置从 docker 中运行的气流登录到 localstack s3 存储桶,这是否还受支持?