以前我们做了一个Seurat包官方的可视化函数投票:可视化单细胞亚群的标记基因的5个方法,下面的5个基础函数相信大家都是已经烂熟于心了:
但是很多小伙伴都会表示官方的出图有点简陋,所以迫切需要升级版可视化策略,但是Seurat包官方并没有在这方面努力,反而是神通广大的各种网友制作了一些打包好的方案,比如前面我们介绍过的SCP:
这个scillus具有两个方面的功能:数据预处理以及可视化。
官网链接:
安装自github上:网络访问不友好
devtools::install_github("xmc811/Scillus", ref = "development")
library(Scillus)
或者下载到本地安装:
# 激活R环境
conda activate R4.4
# xshell终端命令安装
# /usr/local/software/miniconda3/envs/R4.4/lib/R/library/ 这个路径换成自己的R包库
R CMD INSTALL -l /usr/local/software/miniconda3/envs/R4.4/lib/R/library/ /nas2/zhangj/tools/single_cell/Scillus/Scillus-master.tar.gz
此教程使用的数据来自文献:https://www.ncbi.nlm.nih.gov/pubmed/31010835,可以从这里下载:https://github.com/xmc811/Scillus/blob/development/test/GSE128531.tar.gz
下载后进行解压:
这里的数据为示例数据,只有数据的一小部分:为了减少计算时间,数据集仅包含每个样本6个样本和300个细胞
# 解压:
tar -zxvf GSE128531.tar.gz
Scillus-master/test/GSE128531_RAW/
├── CTCL-5.csv.gz
├── CTCL-6.csv.gz
├── CTCL-8.csv.gz
├── HC-1.csv.gz
├── HC-2.csv.gz
└── HC-3.csv.gz
原始数据的下载连接:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128531
Scillus包做单细胞数据处理需要提供一个样本信息文件:metadata,是一个数据框,至少需要两列,一列sample,另一列file或者folder,这取决于输入数据的格式,构造方式如下:
# 注意 此包目前仅支持Seurat v4版本的绘图,v5会报错
library(Seurat, lib.loc = "/nas2/zhangj/biosoft/miniconda3/envs/R4/lib/R/library")
library(Scillus)
library(tidyverse)
library(magrittr)
packageVersion("Seurat")
# [1] ‘4.4.0’
a <- list.files("/nas2/zhangj/tools/single_cell/Scillus/Scillus-master/test/GSE128531_RAW/", full.names = TRUE)
a
m <- tibble(file = a,
sample = stringr::str_remove(basename(a), ".csv.gz"),
group = rep(c("CTCL", "Normal"), each = 3))
m
构造如下:
其他的信息如group, sex, age, treatment也可以添加到这个数据框中,这样在构建seurat对象的时候就可以添加到里面。
此外,对于10x Genomics cellranger的输出结果,构建方式如下:
pal <- tibble(var = c("sample", "group","seurat_clusters"),
pal = c("Set2","Set1","Paired"))
pal
使用load_scfile()函数加载刚刚设置的m,读进来之后为一个list对象:list中每一个样本都为seurat对象。
并且自动使用PercentageFeatureSet()函数计算每个细胞中线粒体基因表达比例。
scRNA <- load_scfile(m)
scRNA
[[1]]
An object of class Seurat
33694 features across 300 samples within 1 assay
Active assay: RNA (33694 features, 0 variable features)
1 layer present: counts
[[2]]
An object of class Seurat
33694 features across 300 samples within 1 assay
Active assay: RNA (33694 features, 0 variable features)
1 layer present: counts
[[3]]
An object of class Seurat
33694 features across 300 samples within 1 assay
Active assay: RNA (33694 features, 0 variable features)
1 layer present: counts
[[4]]
An object of class Seurat
33694 features across 300 samples within 1 assay
Active assay: RNA (33694 features, 0 variable features)
1 layer present: counts
[[5]]
An object of class Seurat
33694 features across 300 samples within 1 assay
Active assay: RNA (33694 features, 0 variable features)
1 layer present: counts
[[6]]
An object of class Seurat
33694 features across 300 samples within 1 assay
Active assay: RNA (33694 features, 0 variable features)
1 layer present: counts
质量控制(QC)措施可以通过plot_qc()
函数进行绘制。生成的图表可以进一步使用ggplot语法进行定制(例如轴标题、主题等)。
percent.mt:
p <- plot_qc(scRNA, metrics = "percent.mt")
p
结果如下:
nFeature_RNA:
p <- plot_qc(scRNA, metrics = "nFeature_RNA")
p
结果如下:
nCount_RNA:
p <- plot_qc(scRNA, metrics = "nCount_RNA")
p
结果如下:
,
group_by, and
pal_setupplot_type
的默认值是“combined”,意味着可以同时绘制箱线图和小提琴图。如果只偏好这两种图表中的某一种,可以将其设置为“box”或“violin”。
只绘制箱线图:
p <- plot_qc(scRNA, metrics = "percent.mt", plot_type = "box")
p
结果如下:
"density"用于绘制密度图。请注意,可以添加额外的ggplot语法(此处为对数变换)。
p <- plot_qc(scRNA, metrics = "nCount_RNA", plot_type = "density") + scale_x_log10()
p
结果如下:
group_by
的默认值是“sample”,对应于metadata m中的样本列,也可以根据这些额外的因素绘制QC措施,比如metadata中的“group”列。
p <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group")
p
结果如下:
参数pal_setup
支持三种类型的输入:RColorBrewer调色板名称、调色板设置数据框(参见初始设置,最后一个部分),以及手动指定的颜色向量。默认值是调色板“Set2”。
# Mitochondrial read percentage in each group (RColorBrewer palette name as palette input)
p1 <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = "Accent")
# Mitochondrial read percentage in each group (Configured dataframe as palette input)
p2 <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = pal)
# Mitochondrial read percentage in each group (Manually specified colors as palette input)
p3 <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = c("purple","yellow"))
p1/p2/p3
结果如下:
函数filter_scdata()
用于Seurat对象的子集筛选。subset
参数的语法与Seurat对象的subset()
函数相同。将自动绘制一个条形图,以显示筛选前后的细胞数量。
scRNA_f <- filter_scdata(scRNA, subset = nFeature_RNA > 500 & percent.mt < 10)
scRNA_f
scRNA_f还是为一个list对象。
自动绘制的条形图:
接着就可以走Seurat的标准流程:标准化,高变基因,CellCycleScoring打分:
scRNA_f %<>%
purrr::map(.f = NormalizeData) %>%
purrr::map(.f = FindVariableFeatures) %>%
purrr::map(.f = CellCycleScoring,
s.features = cc.genes$s.genes,
g2m.features = cc.genes$g2m.genes)
合并并进行整合:
scRNA_int <- IntegrateData(anchorset = FindIntegrationAnchors(object.list = scRNA_f, dims = 1:30, k.filter = 50), dims = 1:30)
# 降维分群:
scRNA_int %<>%
ScaleData(vars.to.regress = c("nCount_RNA", "percent.mt", "S.Score", "G2M.Score"))
scRNA_int %<>%
RunPCA(npcs = 50, verbose = TRUE)
scRNA_int %<>%
RunUMAP(reduction = "pca", dims = 1:20, n.neighbors = 30) %>%
FindNeighbors(reduction = "pca", dims = 1:20) %>%
FindClusters(resolution = 0.3)
通过refactor_seurat()
对Seurat对象的metadata数据进行因子化是一个可选步骤,主要用于改善绘图效果。该函数接受数据m
作为参数,并使Seurat对象的metadata数据具有与m
中相同的因子水平。如果没有提供metadata
参数,Seurat对象元数据中的所有字符向量都将被因子化。
m %<>%
mutate(group = factor(group, levels = c("Normal", "CTCL")))
scRNA_int %<>%
refactor_seurat(metadata = m)
# 注意 此包目前仅支持Seurat v4版本的绘图,v5会报错
# 注意2:单独提供Seurat也可以绘图!!
同前面一样,可以设置三个参数color_by,
split_by, and
pal_setup。
p1 <- plot_scdata(scRNA_int, pal_setup = pal)
p2 <- plot_scdata(scRNA_int, pal_setup = "Dark2")
p3 <- plot_scdata(scRNA_int, color_by = "group", pal_setup = pal)
p1|p2|p3
结果如下:
还可以按照某一属性分割:
p <- plot_scdata(scRNA_int, split_by = "sample", pal_setup = pal)
p
结果如下:
指定颜色:
p <- plot_scdata(scRNA_int, color_by = "sample", pal_setup = c("red","orange","yellow","green","blue","purple"))
p
结果如下:
聚类计数和比例统计可以通过plot_stat()
函数绘制,必须提供plot_type
参数,其值为以下三种之一:“group_count”、“prop_fill”和“prop_multi”。它们的图表如下所示:
p1 <- plot_stat(scRNA_int, plot_type = "group_count")
p2 <- plot_stat(scRNA_int, "group_count", group_by = "seurat_clusters", pal_setup = pal)
p3 <- plot_stat(scRNA_int, plot_type = "prop_fill",
pal_setup = c("grey90","grey80","grey70","grey60","grey50","grey40","grey30","grey20"))
p1/p2/p3
结果如下:
image-20241205000439882
分面:
p <- plot_stat(scRNA_int, plot_type = "prop_multi", pal_setup = "Set3")
p
结果如下:
image-20241205000552306
绘制热图需要通过Seurat找到聚类标记物。
markers <- FindAllMarkers(scRNA_int, logfc.threshold = 0.1, min.pct = 0, only.pos = T)
然后,每个聚类中的顶级基因通过plot_heatmap()
绘制。每个聚类中绘制的基因数量n
的默认值是8。在热图中,每一行代表一个基因,每一列代表一个细胞。细胞可以通过sort_var
进行排序,默认设置为c("seurat_clusters")
p <- plot_heatmap(dataset = scRNA_int,
markers = markers,
n = 6,
sort_var = c("seurat_clusters","sample"),
anno_var = c("seurat_clusters","sample","percent.mt","S.Score","G2M.Score"),
anno_colors = list("Set2", # RColorBrewer palette
c("red","orange","yellow","purple","blue","green"), # color vector
"Reds",
c("blue","white","red"), # Three-color gradient
"Greens"))
p
结果如下:
颜色调整:
p <- plot_heatmap(dataset = scRNA_int,
n = 6,
markers = markers,
sort_var = c("seurat_clusters","sample"),
anno_var = c("seurat_clusters","sample","percent.mt"),
anno_colors = list("Set2",
c("red","orange","yellow","purple","blue","green"),
"Reds"),
hm_limit = c(-1,0,1),
hm_colors = c("purple","black","yellow"))
结果如下:
可以看到,它集成了大量的开箱即用的可视化函数,并不需要自己去调整很多细节,比如《单细胞天地》公众号的这个可视化专辑:
居然还可以做富集分析,用到这里感觉越用越顺手:对 cluster1的特征基因做功能富集分析
p <- plot_cluster_go(markers, cluster_name = "1", org = "human", ont = "CC")
p
结果如下:
所有cluster特征基因富集:
p <- plot_all_cluster_go(markers, org = "human", ont = "CC")
p
结果如下:
制定某基因的各种图如小提琴,Umap散点图,等等:
p <- plot_measure(dataset = scRNA_int,
measures = c("KRT14","percent.mt"),
group_by = "seurat_clusters",
pal_setup = pal)
p
p <- plot_measure_dim(dataset = scRNA_int,
measures = c("nFeature_RNA","nCount_RNA","percent.mt","KRT14"))
p
结果如下:
居然还可以做GSEA,来看看:
# 7亚群细胞数比较少,只在一个分组中存在,这里只用了0:6亚群
de <- find_diff_genes(dataset = scRNA_int,
clusters = as.character(0:6),
comparison = c("group", "CTCL", "Normal"),
logfc.threshold = 0, # threshold of 0 is used for GSEA
min.cells.group = 1) # To include clusters with only 1 cell
gsea_res <- test_GSEA(de, pathway = pathways.hallmark)
head(gsea_res)
p <- plot_GSEA(gsea_res, p_cutoff = 0.1, colors = c("#0570b0", "grey", "#d7301f"))
p
image-20241205002717337
绘图结果如下:
此包应该就是我们常用的各种需求 作者将其打包变成了一个超级好的包!
我现在将其列入年度爱用包榜单!!!
强烈推荐给大家!
最后,如果不想用别人套好的壳子,想学习对应代码,可以去github上看看源码!