生信技能树从今年开始会大力推行 python 版本的生信生态,推广很多关于 python 版本的生信分析教程。敬请关注~新专辑《python生信笔记2025》。
上一期我们学习了使用python读取不同的单细胞数据:python版读取不同的单细胞数据格式(单样本与多样本),今天来看看使用python读取空间转录组的数据。
此次教程分析使用数据:10x官方的Mouse Brain (Coronal) Visium dataset数据集。
下载链接:https://www.10xgenomics.com/datasets/mouse-brain-section-coronal-1-standard-1-0-0
下载:
# Output Files
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_molecule_info.h5
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.h5
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_analysis.tar.gz
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_metrics_summary.csv
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_web_summary.html
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_cloupe.cloupe
整理成如下格式:
mouse-brain-section-coronal-1-standard-1-1-0/
├── filtered_feature_bc_matrix
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── filtered_feature_bc_matrix.h5
├── spatial
│ ├── aligned_fiducials.jpg
│ ├── detected_tissue_image.jpg
│ ├── scalefactors_json.json
│ ├── tissue_hires_image.png
│ ├── tissue_lowres_image.png
│ └── tissue_positions_list.csv
参考:https://stlearn.readthedocs.io/en/latest/tutorials/stSME_clustering.html
环境配置:这个软件特别不好安装
conda create -n stlearn python=3.8 -y
conda activate stlearn
pip install stlearn==0.4.0 -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install jupyterlab -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install --upgrade scanpy
使用 st.Read10X 函数读取:
import numpy as np
import pandas as pd
import stlearn as st
from pathlib import Path
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
# 读取数据
data = st.Read10X("mouse-brain-section-coronal-1-standard-1-1-0/")
data
读取进来后是一个 AnnData 对象, 在经过 stlearn 软件进行 标准化,降维聚类:
# pre-processing for gene count table
st.pp.filter_genes(data,min_cells=1)
st.pp.normalize_total(data)
st.pp.log1p(data)
# pre-processing for spot image
st.pp.tiling(data, "./tiles")
# this step uses deep learning model to extract high-level features from tile images
# may need few minutes to be completed
st.pp.extract_feature(data)
# run PCA for gene expression data
st.em.run_pca(data,n_comps=50)
data_SME = data.copy()
# apply stSME to normalise log transformed data
st.spatial.SME.SME_normalize(data_SME, use_data="raw")
data_SME.X = data_SME.obsm['raw_SME_normalized']
st.pp.scale(data_SME)
st.em.run_pca(data_SME,n_comps=50)
Kmeans 聚类结果:
# K-means clustering on stSME normalised PCA
st.tl.clustering.kmeans(data_SME,n_clusters=19, use_data="X_pca", key_added="X_pca_kmeans")
st.pl.cluster_plot(data_SME, use_label="X_pca_kmeans")
louvain 聚类结果:
# louvain clustering on stSME normalised data
st.pp.neighbors(data_SME,n_neighbors=17,use_rep='X_pca')
st.tl.clustering.louvain(data_SME, resolution=1.19)
st.pl.cluster_plot(data_SME,use_label="louvain")
参考:https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering.html
使用 sc.read_visium 读取,读取进来之后,预处理方式与 单细胞一样:
import scanpy as sc
adata = sc.read_visium(path="../stLearn/mouse-brain-section-coronal-1-standard-1-1-0/")
adata
然后简单的预处理,并降维聚类:
# mitochondrial genes, "MT-" for human, "Mt-" for mouse
adata.var["mt"] = adata.var_names.str.startswith("MT-")
# ribosomal genes
adata.var["ribo"] = adata.var_names.str.startswith(("RPS", "RPL"))
# hemoglobin genes
adata.var["hb"] = adata.var_names.str.contains("^HB[^(P)]")
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt", "ribo", "hb"], inplace=True, log1p=True)
# Saving count data
adata.layers["counts"] = adata.X.copy()
# Normalizing to median total counts
sc.pp.normalize_total(adata)
# Logarithmize the data
sc.pp.log1p(adata)
# 高变基因鉴定
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pl.highly_variable_genes(adata)
# pca
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
# Using the igraph implementation and a fixed number of iterations can be significantly faster, especially for larger datasets
sc.tl.leiden(adata, flavor="igraph", n_iterations=2)
可视化看一下:
sc.pl.umap(adata, color=["leiden"])
空间聚类图:
sc.pl.spatial(adata, img_key = "hires", color="leiden", size=1.2)
需要注意的事,sc.read_visium
这个在 scanpy 1.11.0以后的版本中 已经停止使用:
下一期分享使用 SpatialData 读取空转 10X visum HD的数据~