Byzer-yaml-visualization 是一款 Byzer 可视化插件。通过该插件,用户可以通过 YAML 配置文件描述图表。
可使用如下命令安装(需要有网络):
!plugin app add - "byzer-yaml-visualization-3.0";
卸载:
!plugin app remove "byzer-yaml-visualization-3.0";
卸载需要重启引擎
load delta.`demo.gapminder` as gapminder;
select * from gapminder where continent='Oceania' as gapminder2;
!visualize gapminder2 '''
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
fig:
line:
x: year
y: lifeExp
color: country
''';
load delta.`demo.gapminder` as gapminder;
select * from gapminder where continent='Oceania' as gapminder2;
!visualize gapminder2 '''
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
fig:
bar:
x: year
y: lifeExp
color: country
''';
load delta.`demo.iris`as iris;
!visualize iris '''
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
control:
ignoreSort: false
fig:
scatter:
title: "花瓣图"
x: sepal_width
y: sepal_length
size: petal_length
color: species
hover_data:
- petal_width
labels:
sepal_width: "花瓣宽度"
sepal_length: "花瓣长度"
''';
load delta.`demo.gapminder` as gapminder;
select * from gapminder where continent='Europe'
and year="2007"
as gapminder2;
select pop, if(pop < 2.e6,"Other countries",country) as country from gapminder2
as gapminder3;
!visualize gapminder3 '''
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
fig:
pie:
title: "欧洲人口分布"
values: pop
names: country
''';
load delta.`demo.gapminder` as gapminder;
select * from gapminder where year="2007"
as gapminder2;
!visualize gapminder2 '''
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
fig:
scatter:
x: gdpPercap
y: lifeExp
size: pop
color: continent
hover_name: country
log_x: True
size_max: 60
''';
load Rest.`https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json`
as counties1;
select "counties" as key, cast(content as string) as value from counties1 as counties;
load Rest.`https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv`
as fips1;
select explode(split(cast(content as string),"\n")) as content from fips1 as fips2;
select split(content,",")[0] as fips,cast(split(content,",")[1] as double) as unemp
from fips2 where content != "fips,unemp"
as fips;
!visualize fips '''
confFrom: counties
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
control:
ignoreSort: True
fig:
choropleth_mapbox:
geojson:
vv_type: jsonObj
vv_value: counties
locations: fips
color: unemp
color_continuous_scale: "Viridis"
range_color:
vv_type: code
vv_value: "(0, 12)"
mapbox_style: "carto-positron"
zoom: 3
center:
lat: 37.0902
lon: -95.7129
opacity: 0.5
labels:
unemp: "失业率"
''';
这个可视化略微复杂些,涉及到了变量引用。在 YAML 文件中,我们可以引用表中的数据。不过能够被引用的表的数据必须满足两个条件:
1. 表通过 confFrom 指定
2. 表的字段只能包含 key, value,并且都为字符串。在这个实例里, counties 表是符合这个要求的。
然后你就可以在 YAML 文件中通过如下方式引用 counties
表中 key 字段为 counties 对应的 value 字段的值。
geojson:
vv_type: ref
vv_value: counties
此时 geojson 的值就会到表 counties 中去找,找到key 为 counties 的值。此时值为字符串。
在本例中, vv_type 是个特殊类型 jsonObj
geojson:
vv_type: jsonObj
vv_value: counties
此时 geojson 的值会是 json object 对象。
另外一个值得一提时,这里引入了一个区间表达:
range_color:
vv_type: code
vv_value: "(0, 12)"
在这段 YAML 代码中,实际上 range_color 的值会被翻译成一个区间。
等我填完。。。。
load excel.`./example-data/excel/user-behavior.xlsx`
where header="true" as user_behavior;
select cast(datatime as date) as day,
sum(case when behavior_type = 'pv' then 1 else 0 end) as pv,
count(distinct user_id) as uv
from user_behavior
group by cast(datatime as date)
order by day as day_pv_uv;
!visualize day_pv_uv '''
fig:
bar:
title: "日PV/UV柱状图"
x: day
y:
- pv
- uv
labels:
day: "日期"
''';
YAML中的顶级元数有三个: 1. runtime 配置运行时。 YAML 文件会被转化为 Python 代码执行,所以runtime 其实是配置 Python环境。 2. control 控制图表的一些生成行为,比如是生成html还是image,数据是不是再需要一次排序等等 3. fig 描绘生成什么样的图表,该图表的配置是什么
runtime 下只有一层子元数,常见配置如下。
1. env 指定需要使用的 Python环境。 2. cache 图表结果是不是要缓存,如果你在其他cell要引用这个图标结果,需要设置为true。默认设置为false 即可。 3. output 将图表转化为一个表引用,方便后续 SQL 使用。默认可以不用配置。 4. runIn 在哪个类型节点执行。 driver/executor 。推荐 driver。
1. ignoreSort 默认为true. 系统会对 X 轴字段进行默认进行排序 2. format 默认为 html。 如果需要生成图片,可以设置为 `image`
1. http://fig.xxx 其中 xxx 为图标类型。支持 line,bar 2. fig.xxx.title 图表标题 3. fig.xxx.x X 轴。 支持字符串或者数组配置 4. fig.xxx.y Y 轴。 支持字符串或者数组配置 5. fig.xxx.labels 改动图标中的一些名称。 默认为字典
一个较为完整的配置如下:
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
output: jack
control:
ignoreSort: false
format: image
fig:
bar:
title: "日PV/UV柱状图"
x: day
y:
- pv
- uv
labels:
day: "日期"
下面是一段示例代码:
load excel.`./example-data/excel/user-behavior.xlsx`
where header="true" as user_behavior;
select cast(datatime as date) as day,
sum(case when behavior_type = 'pv' then 1 else 0 end) as pv,
count(distinct user_id) as uv
from user_behavior
group by cast(datatime as date)
order by day as day_pv_uv;
!visualize day_pv_uv '''
runtime:
env: source /opt/miniconda3/bin/activate ray-1.12.0
cache: false
output: jack
control:
ignoreSort: false
format: image
fig:
bar:
title: "日PV/UV柱状图"
x: day
y:
- pv
- uv
labels:
day: "日期"
''';
select unbase64(content) as content, "wow.png" as fileName from jack as imageTable;
save overwrite imageTable as image.`/tmp/images`
where imageColumn="content"
and fileName="fileName";
-- !fs -ls /tmp/images;
save overwrite command as Rest.`YOUR_UPLOAD_URL`
where `config.method`="post"
and `header.content-type`="multipart/form-data"345
and `form.file-path`="/tmp/images/wow.png"
and `form.file-name`="wow.png";
值得注意的是,图片默认被经过 base64 编码,所以我们需要先解码,解码完成后,你可以使用 image 数据源将图片保存到文件系统,然后通过 Rest 数据源上传到某个服务中。