终于有人对Seurat包丑到哭的可视化出手了：年度爱用包！

生信技能树

发布于 2024-12-09 14:24:04

64400

代码可运行

文章被收录于专栏：生信技能树生信技能树

运行总次数：0

代码可运行

以前我们做了一个Seurat包官方的可视化函数投票：可视化单细胞亚群的标记基因的5个方法，下面的5个基础函数相信大家都是已经烂熟于心了：

VlnPlot(pbmc, features = c("MS4A1", "CD79A"))
FeaturePlot(pbmc, features = c("MS4A1", "CD79A"))
RidgePlot(pbmc, features = c("MS4A1", "CD79A"), ncol = 1)
DotPlot(pbmc, features = unique(features)) + RotatedAxis()
DoHeatmap(subset(pbmc, downsample = 100), features = features, size = 3)

但是很多小伙伴都会表示官方的出图有点简陋，所以迫切需要升级版可视化策略，但是Seurat包官方并没有在这方面努力，反而是神通广大的各种网友制作了一些打包好的方案，比如前面我们介绍过的SCP：

接下来分享另外一个本身珍藏已久的年度爱用包（scillus）给大家，此包的官方完整名字是：Seurat wrapper for enhanced processing and visualization of scRNA-seq data

这个scillus具有两个方面的功能：数据预处理以及可视化。

官网链接：

scillus参考教程：https://scillus.netlify.app/
Scillus包地址：https://github.com/xmc811/Scillus

1、安装

安装自github上：网络访问不友好

devtools::install_github("xmc811/Scillus", ref = "development")
library(Scillus)

或者下载到本地安装：

# 激活R环境
conda activate R4.4

# xshell终端命令安装
# /usr/local/software/miniconda3/envs/R4.4/lib/R/library/ 这个路径换成自己的R包库
R CMD INSTALL -l /usr/local/software/miniconda3/envs/R4.4/lib/R/library/ /nas2/zhangj/tools/single_cell/Scillus/Scillus-master.tar.gz

2、功能一：数据预处理

此教程使用的数据来自文献：https://www.ncbi.nlm.nih.gov/pubmed/31010835，可以从这里下载：https://github.com/xmc811/Scillus/blob/development/test/GSE128531.tar.gz

下载后进行解压：

这里的数据为示例数据，只有数据的一小部分：为了减少计算时间，数据集仅包含每个样本6个样本和300个细胞

# 解压：
tar -zxvf GSE128531.tar.gz

Scillus-master/test/GSE128531_RAW/
├── CTCL-5.csv.gz
├── CTCL-6.csv.gz
├── CTCL-8.csv.gz
├── HC-1.csv.gz
├── HC-2.csv.gz
└── HC-3.csv.gz

原始数据的下载连接：https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128531

2.1 Metadata数据准备：

Scillus包做单细胞数据处理需要提供一个样本信息文件：metadata，是一个数据框，至少需要两列，一列sample，另一列file或者folder，这取决于输入数据的格式，构造方式如下：

# 注意 此包目前仅支持Seurat v4版本的绘图，v5会报错
library(Seurat, lib.loc = "/nas2/zhangj/biosoft/miniconda3/envs/R4/lib/R/library")
library(Scillus)
library(tidyverse)
library(magrittr)
packageVersion("Seurat")
# [1] ‘4.4.0’

a <- list.files("/nas2/zhangj/tools/single_cell/Scillus/Scillus-master/test/GSE128531_RAW/", full.names = TRUE)
a
m <- tibble(file = a, 
            sample = stringr::str_remove(basename(a), ".csv.gz"),
            group = rep(c("CTCL", "Normal"), each = 3))

m

构造如下：

其他的信息如group, sex, age, treatment也可以添加到这个数据框中，这样在构建seurat对象的时候就可以添加到里面。

此外，对于10x Genomics cellranger的输出结果，构建方式如下：

2.2 绘图色板设置：

pal <- tibble(var = c("sample", "group","seurat_clusters"),
              pal = c("Set2","Set1","Paired"))
              
pal

2.3 数据加载

使用load_scfile()函数加载刚刚设置的m，读进来之后为一个list对象：list中每一个样本都为seurat对象。

并且自动使用PercentageFeatureSet()函数计算每个细胞中线粒体基因表达比例。

scRNA <- load_scfile(m)

scRNA

[[1]]
An object of class Seurat 
33694 features across 300 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[2]]
An object of class Seurat 
33694 features across 300 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[3]]
An object of class Seurat 
33694 features across 300 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[4]]
An object of class Seurat 
33694 features across 300 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[5]]
An object of class Seurat 
33694 features across 300 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[6]]
An object of class Seurat 
33694 features across 300 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

2.4 QC结果可视化

质量控制（QC）措施可以通过plot_qc()函数进行绘制。生成的图表可以进一步使用ggplot语法进行定制（例如轴标题、主题等）。

percent.mt：

p <- plot_qc(scRNA, metrics = "percent.mt")
p

结果如下：

nFeature_RNA：

p <- plot_qc(scRNA, metrics = "nFeature_RNA")
p

结果如下：

nCount_RNA：

p <- plot_qc(scRNA, metrics = "nCount_RNA")
p

结果如下：

plot_qc函数还有一些其他参数可以调整如：plot_type`,` group_by`, and` pal_setup

plot_type的默认值是“combined”，意味着可以同时绘制箱线图和小提琴图。如果只偏好这两种图表中的某一种，可以将其设置为“box”或“violin”。

只绘制箱线图：

p <- plot_qc(scRNA, metrics = "percent.mt", plot_type = "box")
p

结果如下：

"density"用于绘制密度图。请注意，可以添加额外的ggplot语法（此处为对数变换）。

p <- plot_qc(scRNA, metrics = "nCount_RNA", plot_type = "density") + scale_x_log10()
p

结果如下：

group_by的默认值是“sample”，对应于metadata m中的样本列，也可以根据这些额外的因素绘制QC措施，比如metadata中的“group”列。

p <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group")
p

结果如下：

更改颜色设置

参数pal_setup支持三种类型的输入：RColorBrewer调色板名称、调色板设置数据框（参见初始设置，最后一个部分），以及手动指定的颜色向量。默认值是调色板“Set2”。

# Mitochondrial read percentage in each group (RColorBrewer palette name as palette input)
p1 <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = "Accent") 

# Mitochondrial read percentage in each group (Configured dataframe as palette input)
p2 <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = pal)

# Mitochondrial read percentage in each group (Manually specified colors as palette input)
p3 <- plot_qc(scRNA, metrics = "percent.mt", group_by = "group", pal_setup = c("purple","yellow"))

p1/p2/p3

结果如下：

2.5 数据过滤与整合

函数filter_scdata()用于Seurat对象的子集筛选。subset参数的语法与Seurat对象的subset()函数相同。将自动绘制一个条形图，以显示筛选前后的细胞数量。

scRNA_f <- filter_scdata(scRNA, subset = nFeature_RNA > 500 & percent.mt < 10)
scRNA_f

scRNA_f还是为一个list对象。

自动绘制的条形图：

接着就可以走Seurat的标准流程：标准化，高变基因，CellCycleScoring打分：

scRNA_f %<>% 
        purrr::map(.f = NormalizeData) %>%
        purrr::map(.f = FindVariableFeatures) %>%
        purrr::map(.f = CellCycleScoring, 
                   s.features = cc.genes$s.genes, 
                   g2m.features = cc.genes$g2m.genes)

合并并进行整合：

scRNA_int <- IntegrateData(anchorset = FindIntegrationAnchors(object.list = scRNA_f, dims = 1:30, k.filter = 50), dims = 1:30)

# 降维分群：
scRNA_int %<>%
        ScaleData(vars.to.regress = c("nCount_RNA", "percent.mt", "S.Score", "G2M.Score"))

scRNA_int %<>%
        RunPCA(npcs = 50, verbose = TRUE)

scRNA_int %<>%
        RunUMAP(reduction = "pca", dims = 1:20, n.neighbors = 30) %>%
        FindNeighbors(reduction = "pca", dims = 1:20) %>%
        FindClusters(resolution = 0.3)

可选步骤：

通过refactor_seurat()对Seurat对象的metadata数据进行因子化是一个可选步骤，主要用于改善绘图效果。该函数接受数据m作为参数，并使Seurat对象的metadata数据具有与m中相同的因子水平。如果没有提供metadata参数，Seurat对象元数据中的所有字符向量都将被因子化。

m %<>%
        mutate(group = factor(group, levels = c("Normal", "CTCL")))

scRNA_int %<>%
        refactor_seurat(metadata = m)

3、功能二：单细胞结果可视化

# 注意此包目前仅支持Seurat v4版本的绘图，v5会报错

# 注意2：单独提供Seurat也可以绘图！！

3.1 可视化1：umap散点图：

同前面一样，可以设置三个参数color_by, split_by, and pal_setup。

p1 <- plot_scdata(scRNA_int, pal_setup = pal)
p2 <- plot_scdata(scRNA_int, pal_setup = "Dark2")
p3 <- plot_scdata(scRNA_int, color_by = "group", pal_setup = pal)
p1|p2|p3

结果如下：

还可以按照某一属性分割：

p <- plot_scdata(scRNA_int, split_by = "sample", pal_setup = pal)
p

结果如下：

指定颜色：

p <- plot_scdata(scRNA_int, color_by = "sample", pal_setup = c("red","orange","yellow","green","blue","purple"))
p

结果如下：

3.2 统计绘图

聚类计数和比例统计可以通过plot_stat()函数绘制，必须提供plot_type参数，其值为以下三种之一：“group_count”、“prop_fill”和“prop_multi”。它们的图表如下所示：

p1 <- plot_stat(scRNA_int, plot_type = "group_count")
p2 <- plot_stat(scRNA_int, "group_count", group_by = "seurat_clusters", pal_setup = pal)
p3 <- plot_stat(scRNA_int, plot_type = "prop_fill", 
                pal_setup = c("grey90","grey80","grey70","grey60","grey50","grey40","grey30","grey20"))

p1/p2/p3

结果如下：

image-20241205000439882

分面：

p <- plot_stat(scRNA_int, plot_type = "prop_multi", pal_setup = "Set3")
p

结果如下：

image-20241205000552306

3.3 热图绘制

绘制热图需要通过Seurat找到聚类标记物。

markers <- FindAllMarkers(scRNA_int, logfc.threshold = 0.1, min.pct = 0, only.pos = T)

然后，每个聚类中的顶级基因通过plot_heatmap()绘制。每个聚类中绘制的基因数量n的默认值是8。在热图中，每一行代表一个基因，每一列代表一个细胞。细胞可以通过sort_var进行排序，默认设置为c("seurat_clusters")

p <- plot_heatmap(dataset = scRNA_int, 
                  markers = markers,
                  n = 6,
                  sort_var = c("seurat_clusters","sample"),
                  anno_var = c("seurat_clusters","sample","percent.mt","S.Score","G2M.Score"),
                  anno_colors = list("Set2",                                             # RColorBrewer palette
                                     c("red","orange","yellow","purple","blue","green"), # color vector
                                     "Reds",
                                     c("blue","white","red"),                            # Three-color gradient
                                     "Greens"))
p

结果如下：

颜色调整：

p <- plot_heatmap(dataset = scRNA_int,
             n = 6,
             markers = markers,
             sort_var = c("seurat_clusters","sample"),
             anno_var = c("seurat_clusters","sample","percent.mt"),
             anno_colors = list("Set2",
                                c("red","orange","yellow","purple","blue","green"),
                                "Reds"),
             hm_limit = c(-1,0,1),
             hm_colors = c("purple","black","yellow"))

结果如下：

可以看到，它集成了大量的开箱即用的可视化函数，并不需要自己去调整很多细节，比如《单细胞天地》公众号的这个可视化专辑：

3.4 GO富集分析并绘图

居然还可以做富集分析，用到这里感觉越用越顺手：对 cluster1的特征基因做功能富集分析

p <- plot_cluster_go(markers, cluster_name = "1", org = "human", ont = "CC")
p

结果如下：

所有cluster特征基因富集：

p <- plot_all_cluster_go(markers, org = "human", ont = "CC")
p

结果如下：

3.5 marker基因绘制

制定某基因的各种图如小提琴，Umap散点图，等等：

p <- plot_measure(dataset = scRNA_int, 
                  measures = c("KRT14","percent.mt"), 
                  group_by = "seurat_clusters", 
                  pal_setup = pal)
p

p <- plot_measure_dim(dataset = scRNA_int, 
                      measures = c("nFeature_RNA","nCount_RNA","percent.mt","KRT14"))
p

结果如下：

3.6 GSEA Analysis

居然还可以做GSEA，来看看：

# 7亚群细胞数比较少，只在一个分组中存在，这里只用了0:6亚群
de <- find_diff_genes(dataset = scRNA_int, 
                      clusters = as.character(0:6),
                      comparison = c("group", "CTCL", "Normal"),
                      logfc.threshold = 0,   # threshold of 0 is used for GSEA
                      min.cells.group = 1)   # To include clusters with only 1 cell

gsea_res <- test_GSEA(de,  pathway = pathways.hallmark)
head(gsea_res)


p <- plot_GSEA(gsea_res, p_cutoff = 0.1, colors = c("#0570b0", "grey", "#d7301f"))
p

image-20241205002717337

绘图结果如下：

此包应该就是我们常用的各种需求作者将其打包变成了一个超级好的包！

我现在将其列入年度爱用包榜单！！！

强烈推荐给大家！

最后，如果不想用别人套好的壳子，想学习对应代码，可以去github上看看源码！

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2024-12-06，如有侵权请联系 cloudcommunity@tencent.com 删除

可视化

本文分享自生信技能树微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度