我使用Apache Toree - PySpark运行Jupyter (v4.2.1)。当我试图调用plotly的init_notebook_mode函数时,我遇到了以下错误:
import numpy as np
import pandas as pd
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode()
错误:
Name: org.apache.toree.interpreter.broker.BrokerException
Message: Traceback (most recent call last):
File "/tmp/kernel-PySpark-6415c581-01c4-4c90-b4d9-81773c2bc03f/pyspark_runner.py", line 134, in <module>
eval(compiled_code)
File "<string>", line 7, in <module>
File "/usr/local/lib/python3.4/dist-packages/plotly/offline/offline.py", line 151, in init_notebook_mode
display(HTML(script_inject))
File "/usr/local/lib/python3.4/dist-packages/IPython/core/display.py", line 158, in display
format = InteractiveShell.instance().display_formatter.format
File "/usr/local/lib/python3.4/dist-packages/traitlets/config/configurable.py", line 412, in instance
inst = cls(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 499, in __init__
self.init_io()
File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 658, in init_io
io.stdout = io.IOStream(sys.stdout)
File "/usr/local/lib/python3.4/dist-packages/IPython/utils/io.py", line 34, in __init__
raise ValueError("fallback required, but not specified")
ValueError: fallback required, but not specified
StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140)
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140)
scala.Option.foreach(Option.scala:236)
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:139)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
py4j.Gateway.invoke(Gateway.java:259)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:209)
java.lang.Thread.run(Thread.java:745)
我在网上找不到任何关于这方面的信息。当我深入研究IPython utils中失败的代码- io.py时,我发现传递的流必须同时具有写入和刷新两个属性。但是由于某些原因,在这种情况下传递的流- sys.stdout只有"write“属性,而没有"flush”属性。
发布于 2016-12-15 09:52:36
我相信这是因为IPython的notebook模式假设它运行在IPython jupyter内核中进行notebook通信;您可以在堆栈跟踪中看到它试图调用IPython包。
然而,Toree是一个不同的jupyter内核,它有自己的协议处理来与notebook服务器通信。即使你使用toree来运行一个PySpark解释器,你也会得到一个“普通的”PySpark (就像你从一个shell启动它时一样),toree驱动这个解释器的输入/输出。
因此,没有设置IPython机制,并且在该环境中调用init_notebook_mode()将失败,就像在直接从shell启动的PySpark中运行一样,因为shell对笔记本一无所知。
据我所知,目前还没有办法从通过toree运行的PySpark会话中获得绘图输出--我们最近遇到了同样的问题。你需要运行IPython内核,导入PySpark库并连接到你的Spark集群,而不是通过toree运行python。有关此操作的停靠示例,请参阅https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook。
https://stackoverflow.com/questions/41144467
复制相似问题