如果其他人正在受苦,这就是我们最终所做的。
TL;DR;
- 创建一个
unit-test-runner.py
可以安装一个wheel文件并在其中执行测试。关键是将其安装在“笔记本范围”。
- 部署/复制
unit-test-runner.py
到 databricks dbfs 并创建一个指向它的作业。 Job参数是轮文件pytest
.
- 创建代码的wheel,将其复制到databricks dbfs,使用wheel文件的位置作为参数运行作业unit-test-runner。
项目结构:
root
├── dist
│ └── my_project-0.1.0-py3-none-any.whl
├── poetry.lock
├── poetry.toml
├── pyproject.toml
├── module1.py
├── module2.py
├── housekeeping.py
├── common
│ └── aws.py
├── tests
│ ├── conftest.py
│ ├── test_module1.py
│ ├── test_module2.py
│ └── common
│ └── test_aws.py
└── unit_test_runner.py
unit-test-runner.py
import importlib.util
import logging
import os
import shutil
import sys
from enum import IntEnum
import pip
import pytest
def main(args: list) -> int:
coverage_opts = []
if '--cov' == args[0]:
coverage_opts = ['--cov']
wheels_to_test = args[1:]
else:
wheels_to_test = args
logging.info(f'coverage_opts: {coverage_opts}, wheels_to_test: {wheels_to_test}')
for wh_file in wheels_to_test:
logging.info('pip install %s', wh_file)
pip.main(['install', wh_file])
# we assume wheel name like <pkg name>-version-...
# E.g. my_module-0.1.0-py3-none-any.whl
pkg_name = os.path.basename(wh_file).split('-')[0]
# don't import module to avoid any issues with coverage data.
pkg_root = os.path.dirname(importlib.util.find_spec(pkg_name).origin)
os.chdir(pkg_root)
pytest_opts = [f'--rootdir={pkg_root}']
pytest_opts.extend(coverage_opts)
logging.info(f'pytest_opts: {pytest_opts}')
rc = pytest.main(pytest_opts)
logging.info(f'pytest-status: {rc}/{os.waitstatus_to_exitcode(rc)}, wheel: {wh_file}')
generate_coverage_data(pkg_name, pkg_root, wh_file)
return rc.value if isinstance(rc, IntEnum) else rc
def generate_coverage_data(pkg_name, pkg_root, wh_file):
if os.path.exists(f'{pkg_root}/.coverage'):
shutil.rmtree(f'{pkg_root}/htmlcov', ignore_errors=True)
output_tar = f'{os.path.dirname(wh_file)}/{pkg_name}-coverage.tar.gz'
rc = os.system(f'coverage html --data-file={pkg_root}/.coverage && tar -cvzf {output_tar} htmlcov')
logging.info('rc: %s, coverage data available at: %s', rc, output_tar)
if __name__ == "__main__":
# silence annoying logging
logging.getLogger("py4j").setLevel(logging.ERROR)
logging.info('sys.argv[1:]: %s', sys.argv[1:])
rc = main(sys.argv[1:])
if rc != 0:
raise Exception(f'Unit test execution failed. rc: {rc}, sys.argv[1:]: sys.argv[1:]')
- 安装和配置
databricks-cli
. 请参阅此处的说明 https://docs.databricks.com/dev-tools/cli/index.html.
WORKSPACE_ROOT='/home/kash/workspaces'
USER_NAME='[email protected] /cdn-cgi/l/email-protection'
cd $WORKSPACE_ROOT/my_project
echo 'copying runner..' && \
databricks fs cp --overwrite unit_test_runner.py dbfs:/user/$USER_NAME/
- Go to databricks GUI and create a job https://docs.databricks.com/workflows/jobs/jobs.html#create-a-job pointing to
dbfs:/user/$USER_NAME/unit_test_runner.py
. Can also be done using CLI.
- 工作类型:Python 脚本
- 来源:DBFS/S3
- Path:
dbfs:/user/$USER_NAME/unit_test_runner.py
- Run
databricks jobs list
查找工作 ID,例如123456789
cd $WORKSPACE_ROOT/my_project
poetry build -f wheel # could be replaced with any builder that creates a wheel file
whl_file=$(ls -1tr dist/my_project*-py3-none-any.whl | tail -1 | xargs basename)
echo 'copying wheel...' && databricks fs cp --overwrite dist/$whl_file dbfs:/user/$USER_NAME/wheels
echo 'running job.....' && echo "launching job.." && \
databricks jobs run-now --job-id 123456789 --python-params "[\"/dbfs/user/$USER_NAME/wheels/$whl_file\"]"
# OR with coverage
echo 'running job..' && echo "launching job with coverage.." && \
databricks jobs run-now --job-id 123456789 --python-params "[\"--cov\", \"/dbfs/user/$USER_NAME/wheels/$whl_file\"]"
如果你和--cov
然后选择获取并打开覆盖率报告:
rm -f htmlcov/ my_project_coverage_report.tar.gz
databricks fs cp dbfs:/user/$USER_NAME/wheels/my_project_coverage_report.tar.gz .
tar -xvzf my_project_coverage_report.tar.gz
firefox htmlcov/index.html