当前位置：首页 > news >正文

cellphoneDB v5更新与Python环境可视化

news 2025/7/12 8:41:01

导入环境

输入准备

矩阵与注释

DEG信息

空间微环境（可选）

调控因子信息（可选）

正式运行

读取结果可视化

热图

气泡图

之前发布过一篇使用cellphonedb进行细胞间通讯分析的文章

cellphoneDB进行CCI以及可视化-CSDN博客

后面发现这个方法的最新版本添加了很多新的输入文件，并且可视化也可以不用转到R环境，对应的python包就可以实现所有必要的可视化，于是更新记录一下

最新的安装包和说明详见githubhttps://github.com/ventolab/CellphoneDB

导入环境

import numpy as np
import pandas as pd
import scanpy as sc
import os
import sys
from scipy import sparse
from cellphonedb.src.core.methods import cpdb_degs_analysis_methodsc.settings.verbosity = 1 
sys.executableimport anndata as ad
import ktplotspy as kpy
import matplotlib.pyplot as plt
%matplotlib inline

注意这里安装完记得检查一下cellphonedb版本是否>5.0.0

输入准备

# Our base directory for the analysis
os.chdir('/data/work/')# Path to our input files
cpdb_file_path = 'input/cellphonedb.zip'
meta_file_path = 'input/meta.tsv'
counts_file_path = 'input/counts_normloqTransformed.h5ad'
microenvs_file_path = 'input/microenvironments.tsv'
degs_file_path = 'input/DEGs_upregulated_genes.tsv'
active_tf_path = 'input/active_TFs.tsv'
out_path = 'results/'

矩阵与注释

首先还是正常先导入矩阵和注释信息

# 计数矩阵
adata = sc.read('/data/work/your_sc_or_sp.h5ad')
adata.write(counts_file_path) # 生成Metadata文件（假设注释列为annotation）
meta_df = adata.obs.reset_index()[['index', 'annotation']]
meta_df.columns = ['Cell', 'annotation']
meta_df.to_csv(meta_file_path, sep='\t', index=False)

接下来是相比旧版本多出的三个

DEG信息

degs_file_path 添加数据高表达基因的信息，其实只需要注释分群信息和对应的基因两列即可

这里我的DEG表格来自之前Seurat中的FindAllMarkers，并进行了简单的过滤

library(Seurat)
library(harmony)
library(ggplot2)data <- readRDS("/data/work/your_sc_sp.rds")
Idents(data) <- "annotation"
# 可以自定义过滤阈值
markers <- FindAllMarkers(data, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
write.csv(markers, file = "/data/work/annotation_deg.csv", row.names = FALSE)

导入结果表格如下

# 读取DEG文件
input_path = "/data/work/annotation_deg.csv"
df = pd.read_csv(input_path)
df_sub = df[['cluster', 'gene']]
df_sub.to_csv(degs_file_path, sep='\t', index=False, header=True)

空间微环境（可选）

microenvs_file_path 用来限制空间转录组数据的细胞微环境，最后的通路会被限制在单个微环境中，从而过滤一些较远的不真实信号，

这一步可以自行选择，可以不添加这个输入也可以设置成同一个微环境作为输入

## 假设我这里创建neuron与noneuron两种环境
cell_types = adata.obs['annotation'].unique().tolist()# 创建微环境映射
microenv_mapping = []
for cell_type in cell_types:# 判断是否为neuron微环境,EX开头和In的亚群if cell_type.startswith('EX_') or cell_type == 'In':microenv = 'neuron'else:microenv = 'noneuron'microenv_mapping.append({'annotation': cell_type, 'microenvironment': microenv})microenv_df = pd.DataFrame(microenv_mapping)
microenv_df.to_csv(microenvs_file_path, sep='\t', index=False)

调控因子信息（可选）

active_tf_path 纳入转录调控因子分析的信息，比如我们可以把scenic/pyscenic的结果进行适当修改作为这里转录调控信息的输入，这一步同样也是可选的

import networkx as nx
import matplotlib.pyplot as plt
from scipy.stats import wilcoxon
from scipy.stats import mannwhitneyu
from statsmodels.stats.multitest import fdrcorrection# auc_matrix是每个细胞对应每个TF的分值矩阵
auc_df = pd.read_csv("/data/work/pyscenic_out/auc_matrix.csv")
auc_df.columns = auc_df.columns.str.replace(r'\(\+\)$', '', regex=True)
auc_df.set_index('Cell', inplace=True)# 对齐细胞名称
common_cells = adata.obs_names[adata.obs_names.isin(auc_df.index)]
adata = adata[common_cells]
auc_df = auc_df.loc[common_cells]# 提取TF名称（从列名中排除'Cell'）
tfs = auc_df.columns.tolist()  # 直接获取所有TF列名# 按细胞类型检测差异活性
active_tfs_list = []
for cluster in adata.obs['annotation'].unique():cluster_mask = (adata.obs['annotation'] == cluster)cluster_auc = auc_df.loc[cluster_mask]other_auc = auc_df.loc[~cluster_mask]for tf in tfs:# 跳过全零的TF以避免检验失败if cluster_auc[tf].max() == 0 and other_auc[tf].max() == 0:continue           try:# Mann-Whitney U检验（单侧：当前类型活性更高）stat, pval = mannwhitneyu(cluster_auc[tf], other_auc[tf], alternative='greater')mean_activity = cluster_auc[tf].mean()active_tfs_list.append({'annotation': cluster, 'TF': tf,'pval': pval,'mean_activity': mean_activity})except ValueError:  # 处理样本量不足continue# 多重检验校正
tf_activity_df = pd.DataFrame(active_tfs_list)
if not tf_activity_df.empty:_, qvals = fdrcorrection(tf_activity_df['pval'])tf_activity_df['qval'] = qvals
else:tf_activity_df['qval'] = np.nan  # 避免空数据时报错# 筛选显著活跃的TF（q<0.05 & 平均活性>0.1）
sig_tf_df = tf_activity_df[(tf_activity_df['qval'] < 0.05) & (tf_activity_df['mean_activity'] > 0.1)
]# 保存为CellPhoneDB格式
sig_tf_df[['annotation', 'TF']].to_csv(active_tf_path, sep='\t', index=False, header=True)

这里的调控因子矩阵打分来自pyscenic流程，有空也梳理一下这个软件，具体显著活跃TF的筛选过程仅供参考，最终得到每个分群注释对应的活跃TF两列表格即可。

总之，通过对上述新输入信息的加权计算，最终我们得到的通路信息显然更加真实可信

正式运行

准备好所有的输入文件后，就可以进行正式计算了

cpdb_results = cpdb_degs_analysis_method.call(cpdb_file_path = cpdb_file_path,                            # mandatory: CellphoneDB database zip file.meta_file_path = meta_file_path,                            # mandatory: tsv file defining barcodes to cell label.counts_file_path = counts_file_path,                        # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData objectdegs_file_path = degs_file_path,                            # mandatory: tsv file with DEG to account.counts_data = 'hgnc_symbol',                                # defines the gene annotation in counts matrix.
#    active_tfs_file_path = active_tf_path,                      # optional: defines cell types and their active TFs.
#    microenvs_file_path = microenvs_file_path,                  # optional (default: None): defines cells per microenvironment.score_interactions = True,                                  # optional: whether to score interactions or not. threshold = 0.1,                                            # defines the min % of cells expressing a gene for this to be employed in the analysis.result_precision = 3,                                       # Sets the rounding for the mean values in significan_means.separator = '|',                                            # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".debug = False,                                              # Saves all intermediate tables emplyed during the analysis in pkl format.output_path = out_path,                                     # Path to save resultsoutput_suffix = None,                                       # Replaces the timestamp in the output files by a user defined string in the  (default: None)threads = 25)

运行日志以及输出的文件如下

（这个方法运行速度要快一点，毕竟取了DEG）

读取结果可视化

首先读取我们上一步结果路径里的文件，注意DEG方法的pvals指的是relevant_interactions这个文件，里面是布尔值，旧方法里指的就是pvalues文件

# .h5ad file used for performing CellPhoneDB
adata = ad.read_h5ad("/data/work/input/counts_normloqTransformed.h5ad")
# output from CellPhoneDB
means = pd.read_csv("/data/work/results/degs_analysis_means_07_10_2025_103746.txt", sep="\t")
significant_means = pd.read_csv("data/work/results/degs_analysis_significant_means_07_10_2025_103746.txt", sep="\t")
decon = pd.read_csv("data/work/results/degs_analysis_deconvoluted_07_10_2025_103746.txt", sep="\t")
pvals = pd.read_csv("data/work/results/degs_analysis_relevant_interactions_07_10_2025_103746.txt", sep="\t")

热图

symmetrical参数可以选择是否强制对称，也可以挑选感兴趣的注释分群显示

kpy.plot_cpdb_heatmap(pvals=pvals, figsize=(5, 5), title="Sum of significant interactions")
kpy.plot_cpdb_heatmap(pvals=pvals, figsize=(5, 5), title="Sum of significant interactions", symmetrical=False)
kpy.plot_cpdb_heatmap(pvals=pvals, cell_types=["EX_BDNF","EX_NTNG1","Oligo","Astro","In","Endo","Opc"], figsize=(4, 4), title="Sum of significant interactions"
)

气泡图

将结果与Anndata矩阵文件结合，可以绘制表达量气泡图

p1 = kpy.plot_cpdb(adata = adata,cell_type1 = "EX_BDNF",#感兴趣分群，全部则使用"."cell_type2 = ".", means = means,pvals = pvals,celltype_key = "annotation",#分群标签genes = ["ADCYAP1"],#感兴趣基因figsize = (25,5),title = "Interactions involving ADCYAP1 ligand",max_size = 5,highlight_size = 0.75,degs_analysis = True,standard_scale = False
)
p1.save('dotplot_EX_BDNF_ADCYAP1.pdf')