我正在使用 Kedro 创建一些数据管道,我有多种文件格式以及自定义文件格式。根据文档,我创建了一个globals.yml
under config/base
我在其中创建了一些全局变量以在catalog.yml
。不幸的是,我似乎无法让它发挥作用。
globals.yml
看起来像这样:
paths:
base_path: "s3://my_project"
datasets:
pdf: "base.PDFDataSet"
png: "pillow.ImageDataSet"
csv: "pandas.CSVDataSet"
excel: "pandas.ExcelDataSet"
data_folders:
raw: "01_raw"
intermediate: "02_intermediate"
primary: "03_primary"
feature: "04_feature"
model_input: "05_model_input"
models: "06_models"
model_output: "07_model_output"
reporting: "08_reporting"
settings.py
看起来像这样:
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
"globals_pattern": "*globals.yml",
}
catalog.yml
看起来像这样:
_label_images: &label_images
type: PartitionedDataSet
path: "${paths.base_path}/data/${data_folders.raw}/label_images"
dataset: ${datasets.png}
label_images_png:
<<: *label_images
filename_suffix: .png
label_images_jpg:
<<: *label_images
filename_suffix: .jpg
label_images_jpeg:
<<: *label_images
filename_suffix: .jpeg
label_images_pdf:
<<: *label_images
dataset: base.PDFDataSet
filename_suffix: .pdf
my_project_label_extracts:
type: PartitionedDataSet
path: s3://my_project/data/01_raw/label_extracts
dataset: pandas.ExcelDataSet
我相信我应该能够引用内部的变量catalog.yml
与 ${...}。但是,我收到此错误:
Invalid bucket name "${bucket}": Bucket name must match the regex
"^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex
"^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-
Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA
-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"