这几个月都是忙于临床工作,完全没有时间看代码相关的东西。🥲
发现自己真的是已经跟不上时代的变化了,大家现在做的分析都好高级啊。😅
利用碎片时间学习这些东西可能还是不合适的,要有大块的时间进行系统学习。📖
今天是R
版本的velocyto
,不得不说比Python
版本的难搞多了。😩
rm(list = ls())
#devtools::install_github("velocyto-team/velocyto.R")
library(velocyto.R)
这里用到的是一个SMART-seq2
数据集!~ 😘
ldat <- readRDS("./ldat.rds")
str(ldat)
增加可读性!~🙊
ldat <- lapply(ldat,function(x) {
colnames(x) <- gsub("_unique.bam","",gsub(".*:","",colnames(x)))
x
})
这里我们把实现准备好的细胞注释和降维数据读入。🐶
cell.colors <- readRDS("./cell.colors.rds")
emb <- readRDS("./embedding.rds")
剪接基因表达量分布!~😂
hist(log10(rowSums(ldat$spliced)+1), col='#7BAFDE',xlab='log10[ number of reads + 1]',main='number of reads per gene')
开始过滤基因!~ 😏
# exonic read (spliced) expression matrix
emat <- ldat$spliced;
# intronic read (unspliced) expression matrix
nmat <- ldat$unspliced
# spanning read (intron+exon) expression matrix
smat <- ldat$spanning;
# filter expression matrices based on some minimum max-cluster averages
emat <- filter.genes.by.cluster.expression(emat,cell.colors,min.max.cluster.average = 5)
nmat <- filter.genes.by.cluster.expression(nmat,cell.colors,min.max.cluster.average = 1)
smat <- filter.genes.by.cluster.expression(smat,cell.colors,min.max.cluster.average = 0.5)
看看还剩多少!~🧐
# look at the resulting gene set
length(intersect(rownames(emat),rownames(nmat)))
# and if we use spanning reads (smat)
length(intersect(intersect(rownames(emat),rownames(nmat)),rownames(smat)))
这里我们需要设置一下最小/最大分位数拟合,这里设置0.05
。😜
fit.quantile <- 0.05
rvel.qf <- gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells = 5,fit.quantile = fit.quantile)
选取前5个主成分进行可视化!~🙊
pca.velocity.plot(rvel.qf,nPcs=5,plot.cols=2,cell.colors=ac(cell.colors,alpha=0.7),cex=1.2,pcount=0.1,pc.multipliers=c(1,-1,-1,-1,-1))
可视化单个基因。🧬
# define custom pallet for expression magnitude
gene.relative.velocity.estimates(emat,nmat, kCells = 5,fit.quantile = fit.quantile,old.fit=rvel.qf,show.gene='Chga',cell.emb=emb,cell.colors=cell.colors)
我们还可以用smat
来拟合基因偏移,这样更准确,但是基因要少多了。🙂
rvel <- gene.relative.velocity.estimates(emat,nmat,smat=smat, kCells = 5, fit.quantile=fit.quantile, diagonal.quantiles = TRUE)
可视化一下!~😘
pca.velocity.plot(rvel,nPcs=5,plot.cols=2,cell.colors=ac(cell.colors,alpha=0.7),cex=1.2,pcount=0.1,pc.multipliers=c(1,-1,1,1,1))
在这里,我们使用gamma拟合
来计算·velocity·,无需细胞kNN
。🦀
rvel1 <- gene.relative.velocity.estimates(emat,nmat,deltaT=1,deltaT2 = 1,kCells = 1, fit.quantile=fit.quantile)
可视化!~
pca.velocity.plot(rvel1,nPcs=5,plot.cols=2,cell.colors=ac(cell.colors,alpha=0.7),cex=1.2,pcount=0.1,pc.multipliers=c(1,-1,1,1,1))
这里我们使用t-SNE embedding
。
vel <- rvel; arrow.scale=3; cell.alpha=0.4; cell.cex=1; fig.height=4; fig.width=4.5;
show.velocity.on.embedding.cor(emb,vel,n=100,scale='sqrt',cell.colors=ac(cell.colors,alpha=cell.alpha),cex=cell.cex,arrow.scale=arrow.scale,arrow.lwd=1)
也可以使用相同的函数计算velocity
。😘
show.velocity.on.embedding.cor(emb,vel,n=100,scale='sqrt',cell.colors=ac(cell.colors,alpha=cell.alpha),cex=cell.cex,arrow.scale=arrow.scale,show.grid.flow=TRUE,min.grid.cell.mass=0.5,grid.n=20,arrow.lwd=2)
把示例数据加载进来。😏
ip.mm10 <- readRDS("./ip.mm10.rds")
gene.info <- readRDS("./gene.info.rds")
Genome-wide model fit: 全基因组模型拟合。🧐
# start with unfiltered matrices, as we can use more genes in these types of estimates
emat <- ldat$spliced; nmat <- ldat$unspliced; smat <- ldat$spanning
emat <- filter.genes.by.cluster.expression(emat,cell.colors,min.max.cluster.average = 7)
gvel <- global.velcoity.estimates(emat, nmat, rvel, base.df=gene.info$gene.df, smat=smat, deltaT=1, kCells=5, kGenes = 15, kGenes.trim = 5, min.gene.cells = 0, min.gene.conuts = 500)
pca.velocity.plot(gvel,nPcs=5,plot.cols=2,cell.colors=ac(cell.colors,alpha=0.7),cex=1.2,pcount=0.1,pc.multipliers=c(1,-1,-1,1,1))
1️⃣ gene-relative esitmate
。🙊
par(mfrow=c(1,2), mar = c(2.5,2.5,2.5,1.5), mgp = c(2,0.65,0), cex = 0.85);
arrow.scale=3; cell.alpha=0.4; cell.cex=1; fig.height=4; fig.width=4.5;
#pdf(file='tsne.rvel_gvel.plots.pdf',height=6,width=12)
show.velocity.on.embedding.cor(emb,rvel,n=100,scale='sqrt',cell.colors=ac(cell.colors,alpha=cell.alpha),cex=cell.cex,arrow.scale=arrow.scale,arrow.lwd=1,main='gene-relative esitmate',do.par=F)
1️⃣ gene-structure estimate
。😘
show.velocity.on.embedding.cor(emb,gvel,n=100,scale='sqrt',cell.colors=ac(cell.colors,alpha=cell.alpha),cex=cell.cex,arrow.scale=arrow.scale,arrow.lwd=1,main='gene-structure estimate',do.par=F)
或者映射到另一个相同的tSNE空间,这样也是可以的。💪
2️⃣ gene-relative estimate
。
#pdf(file='tsne.shift.plots.pdf',height=6,width=12)
par(mfrow=c(1,2), mar = c(2.5,2.5,2.5,1.5), mgp = c(2,0.65,0), cex = 0.85);
x <- tSNE.velocity.plot(rvel,nPcs=15,cell.colors=cell.colors,cex=0.9,perplexity=200,norm.nPcs=NA,pcount=0.1,scale='log',do.par=F,main='gene-relative estimate')
2️⃣ gene-structure estimate
😂
x <- tSNE.velocity.plot(gvel,nPcs=15,cell.colors=cell.colors,cex=0.9,perplexity=200,norm.nPcs=NA,pcount=0.1,scale='log',do.par=F,main='gene-structure estimate')
x <- show.velocity.on.embedding.eu(emb,rvel,n=40,scale='sqrt',cell.colors=ac(cell.colors,alpha=cell.alpha),cex=cell.cex,nPcs=30,sigma=2.5,show.trajectories=TRUE,diffusion.steps=400,n.trajectory.clusters=15,ntop.trajectories=1,embedding.knn=T,control.for.neighborhood.density=TRUE,n.cores=6)