我正在尝试使用下面的自动化来打包我的PySpark代码,以便在之后使用spark-submit来运行它:
https://bytes.grubhub.com/managing-dependencies-and-artifacts-in-pyspark-7641aa89ddb7
https://github.com/alekseyig/spark-submit-deps
由于我的pip版本可能比作者的版本高,因此我需要对setup.py进行以下更改:
from pip.commands import WheelCommand
=> from pip._internal.commands.wheel import WheelCommand
from pip.req import parse_requirements
=> from pip._internal.req import parse_requirements
但遗憾的是,在此之后,如果我尝试运行python setup.py bdist_spark
,它将返回以下错误:
Traceback (most recent call last):
File "setup.py", line 116, in <module>
"bdist_spark": BdistSpark
File "<conda_env>\lib\site-packages\setuptools\__init__.py", line 145, in setup
return distutils.core.setup(**attrs)
File "<conda_env>\lib\distutils\core.py", line 151, in setup
dist.run_commands()
File "<conda_env>\lib\distutils\dist.py", line 953, in run_commands
self.run_command(cmd)
File "<conda_env>\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "setup.py", line 43, in run
wheel_command = WheelCommand(isolated=False)
File "<conda_env>\lib\site-packages\pip\_internal\commands\wheel.py", line 52, in __init__
super(WheelCommand, self).__init__(*args, **kw)
TypeError: __init__() takes at least 3 arguments (2 given)
我尝试过修复它,但没有成功,我在SWF或pip文档/代码中找不到任何对我有帮助的东西。
你能帮我看一下吗?
发布于 2019-12-19 19:18:07
我也处于同样的情况(我的Python3.7版本)。但我取得了进展,能够使用2个zip文件(1个用于依赖项和实际代码)进行spark提交。不确定这是不是正确的方式。这里的建筑是用铀建造的。下面是我如何将所有依赖库放到dist文件夹中,然后在gradle任务中的wards之后将其压缩的代码
from uranium import current_build
@current_build.task
def package_for_spark(build):
download_dependency_cmd = ['pip3', 'install',
'-r', '.requirements.txt', '-t', 'dist']
download_dependency_cmd.extend(index_url_paras)
logger.info('download cmd: {}'.format(download_dependency_cmd))
print(download_dependency_cmd)
build.executables.run(download_dependency_cmd)
https://stackoverflow.com/questions/59388339
复制相似问题