然后进去看了一下,打开了ggplot新天地:
这个网页中居然收录了140+个ggplot2可视化扩展包,妈妈再也不用担心我的绘图问题了吧!其中不乏大家常见的如:
现在让我先来看看其中一些包的可视化结果:我对其中两个包比较感兴趣,gggenes 和 gggenomes,这两个包可以可视化参考基因组上的基因结构等。简单探索一下其中一个:
包的网址:https://wilkox.org/gggenes/
首先安装:
# 设置镜像
options(BioC_mirror="https://mirrors.westlake.edu.cn/bioconductor")
options("repos"=c(CRAN="https://mirrors.westlake.edu.cn/CRAN/"))
# 安装cran上的
install.packages("gggenes")
# 或者 安装github上的最新版
devtools::install_github("wilkox/gggenes")
# 检测是否安装成功
library(gggenes)
library(ggplot2)
geom_gene_arrow()
函数绘制基因箭头:geom_gene_arrow()
是一个 ggplot2 的几何对象(geom),它用箭头来表示基因,
基因的起始位置和终止位置分别映射到 xmin
和 xmax
上,用来确定箭头指向的方向,
y轴对应不同的染色体。
# 看下包中自带的测试数据
head(example_genes)
# molecule gene start end strand orientation
# 1 Genome1 genA 15389 17299 reverse 1
# 2 Genome1 genB 17301 18161 forward 0
# 3 Genome1 genC 18176 18640 reverse 1
# 4 Genome1 genD 18641 18985 forward 0
# 5 Genome1 genE 18999 20078 reverse 1
# 6 Genome1 genF 20086 20451 forward 1
table(example_genes$molecule)
第一列为基因的染色体编号;
第二列为基因symbol;
第三、四列为基因的起始,终止坐标;
第五列为基因的正负链;
第六列为基因的方向,与第五列对应;
先看看基础绘图:
p <- ggplot(example_genes, aes(xmin = start, xmax = end, y = molecule, fill = gene)) +
geom_gene_arrow() +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3")
ggsave(filename = "plot1.png", width = 12, height = 8, plot = p)
不同染色体上的不同基因:需要注意的是这里所用的数据只是一个示例数据,因为同一个基因应该不可能在不同的染色体上面。
theme_genes
进行美化使用ggplot2 主题 theme_genes
进行美化:
p <- ggplot(example_genes, aes(xmin = start, xmax = end, y = molecule, fill = gene)) +
geom_gene_arrow() +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
ggsave(filename = "plot2.png", width = 12, height = 8, plot = p)
比默认看着更加清爽:
make_alignment_dummies()
在分面图中对基因进行对齐通常我们可能会希望某个基因在分面中的坐标垂直对齐:
这里使用genE作为参考位置,其他基因与其坐标对齐。
dummies <- make_alignment_dummies(
example_genes,
aes(xmin = start, xmax = end, y = molecule, id = gene),
on = "genE"
)
p <- ggplot(example_genes, aes(xmin = start, xmax = end, y = molecule, fill = gene)) +
geom_gene_arrow() +
geom_blank(data = dummies) +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
ggsave(filename = "plot3.png", width = 12, height = 8, plot = p)
可以看到不同分面中的基因位置有了相对变化:
geom_gene_label()
为基因添加标签geom_gene_label()
利用 ggfittext
包来将标签文本适应到基因箭头内部,有关它如何调整大小和重新流动文本以适应的更多细节,请参阅 ggfittext
文档。
p <- ggplot( example_genes, aes(xmin = start, xmax = end, y = molecule, fill = gene, label = gene) ) +
geom_gene_arrow(arrowhead_height = unit(3, "mm"), arrowhead_width = unit(1, "mm")) +
geom_gene_label(align = "left") +
geom_blank(data = dummies) +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
ggsave(filename = "plot4.png", width = 12, height = 8, plot = p)
结果如下:
forward
调整基因方向一般来说基因的方向是根据xmin和
xmax决定的,但是我们还可以使用forward参数改变其方向。这在基因的坐标和方向作为独立变量进行编码时非常有用。
如果 forward
的值为 TRUE(默认值),基因将按照暗示的方向绘制,即从 xmin
指向 xmax
。
如果 forward
的值为 FALSE,基因将按照与暗示方向相反的方向绘制:
p <- ggplot(example_genes, aes(xmin = start, xmax = end, y = molecule, fill = gene, forward = orientation)) +
geom_gene_arrow() +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
ggsave(filename = "plot5.png", width = 12, height = 8, plot = p)
结果如下:
geom_subgene_arrow()
绘制基因结构域我们可以使用 geom_subgene_arrow()
来突出显示基因的子段,例如蛋白质结构域或局部比对区域。
这里 需要 xsubmin
and xsubmax
来确定基因的结构区域,例如参考数据:
head(example_subgenes)
molecule gene start end strand subgene from to orientation
1 Genome5 genA 405113 407035 forward genA-1 405774 406538 0
2 Genome5 genB 407035 407916 forward genB-1 407458 407897 0
3 Genome5 genC 407927 408394 forward genC-1 407942 408158 0
4 Genome5 genC 407927 408394 forward genC-2 408186 408209 0
5 Genome5 genC 407927 408394 forward genC-3 408233 408257 0
6 Genome5 genF 409836 410315 forward genF-1 409938 410016 0
绘制:
p <- ggplot(example_genes, aes(xmin = start, xmax = end, y = molecule)) +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
geom_gene_arrow(fill = "white") +
geom_subgene_arrow(data = example_subgenes,
aes(xmin = start, xmax = end, y = molecule, fill = gene,
xsubmin = from, xsubmax = to), color="black", alpha=.7) +
theme_genes()
ggsave(filename = "plot6.png", width = 12, height = 8, plot = p)
结果如下:
给他贴上基因symbol:
data <- subset(example_genes, molecule == "Genome4" & gene == "genA")
data
data_subgene <- subset(example_subgenes, molecule == "Genome4" & gene == "genA")
data_subgene
p <- ggplot(data, aes(xmin = start, xmax = end, y = strand) ) +
geom_gene_arrow() +
geom_gene_label(aes(label = gene)) +
geom_subgene_arrow(data=data_subgene, aes(xsubmin = from, xsubmax = to, fill = subgene)) +
geom_subgene_label(data=data_subgene, aes(xsubmin = from, xsubmax = to, label = subgene),min.size = 0)
ggsave(filename = "plot7.png", width = 6, height = 2, plot = p)
结果如下:
geom_feature()
绘制点状遗传特征我们可以绘制点状遗传特征,例如限制性酶切位点或转录起始位点。
需要的数据如下:
head(example_features)
molecule name type position forward
1 Genome1 tss9 tss 22988 NA
2 Genome1 rs4 restriction site 18641 NA
3 Genome1 ori5 ori 18174 NA
4 Genome2 rs0 restriction site 12256 NA
5 Genome2 rs1 restriction site 14076 NA
6 Genome2 ori1 ori 13355 FALSE
绘制:
p <- ggplot(example_genes, aes(xmin = start, xmax = end, y = molecule, fill = gene, label = gene)) +
geom_feature(data = example_features, aes(x = position, y = molecule, forward = forward) ) +
geom_feature_label(
data = example_features, aes(x = position, y = molecule, label = name, forward = forward) ) +
geom_gene_arrow() +
geom_gene_label() +
geom_blank(data = example_dummies) +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
ggsave(filename = "plot8.png", width = 12, height = 8, plot = p)
结果如下:
还有更多有意思的包,前去探索一下吧~