当前位置：首页 > news >正文

R语言 | 使用 ComplexHeatmap 绘制热图，分区并给对角线分区加黑边框

news 2025/6/30 17:18:46

目的：画热图，分区，给对角线分区添加黑色边框
建议直接看0和4。

0. 准备数据

# 安装并加载必要的包
#install.packages("ComplexHeatmap")  # 如果尚未安装
library(ComplexHeatmap)# 使用 iris 数据集 #data(iris)# 选择数值列（去掉物种列）
data0 <- iris
rownames(data0)=paste0(iris$Species, 1:nrow(data0))# data0 <- mtcars 分类效果不好# 随机抽取30个
set.seed(42)
dat=data0[sample(nrow(data0), 30), 1:4]
#dat=data0# 计算余弦距离
#install.packages("proxy")      # 如果尚未安装
library("proxy")
distance_matrix <- as.matrix(dist(dat, method = "cosine"))
# 如果不想安装新包，也可以使用默认的欧氏距离：
#distance_matrix <- as.matrix(dist(iris_data, method = "euclidean"))# 使用相似性绘图 simi=1-dist
similarity=1-distance_matrixdim(similarity)
[1] 30 30

1. Heatmap 全部分块加黑框

library(circlize)
col_fun = colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
col_fun(seq(-3, 3))Heatmap(similarity, name = "mat", #col = col_fun,row_km = 3, column_km = 3,)
# 每个分块绘制黑边框
# When the heatmap is split, layer_fun is applied in every slice.
Heatmap(similarity, name = "mat", #col = col_fun,row_km = 3, column_km = 3,layer_fun = function(j, i, x, y, width, height, fill) {# 全部分块都加黑框v = pindex(similarity, i, j)#grid.text(sprintf("%.1f", v), x, y, gp = gpar(fontsize = 10))str(v)grid.rect(gp = gpar(lwd = 2, fill = "transparent"))if(sum(v > 0)/length(v) > 0.75) {}})

在这里插入图片描述

2. 为对角线分块添加黑边框

Heatmap(similarity, name = "mat",#col = c("white", "yellow", "red3"),#col = col_fun,col =  colorRamp2(c(0.5, 0.75, 1), c("white", "yellow", "red3")),row_km = 3, column_km = 3,layer_fun = function(j, i, x, y, width, height, fill, slice_r, slice_c) {v = pindex(similarity, i, j)#grid.text(sprintf("%.1f", v), x, y, gp = gpar(fontsize = 10))if(slice_r == slice_c) {grid.rect(gp = gpar(lwd = 4, fill = "transparent", col="black"))}})

在这里插入图片描述

3. 添加列注释

还有一个与 pheatmap 包同名的函数：


annotation_col = data.frame(type = data0$Species,row.names = rownames(data0)
)[rownames(dat), ,drop=F]
# set colors
ann_colors = list(#type = c('setosa'="#ed553b", 'versicolor'="#99b433", 'virginica'="orange")type = c('setosa'="violetred1", 'versicolor'="turquoise2", 'virginica'="blueviolet")
)
# "#ed553b", "#99b433"
#violetred1,turquoise2,pheatmap(similarity,name = "Cosine\nsimilarity",main="xx", border_color = NA,clustering_method = "ward.D2",annotation_col = annotation_col, #set anno for columnannotation_colors = ann_colors, #set colors#col = c("white", "yellow", "red3"),#col = col_fun,col =  colorRamp2(c(0.8, 0.9, 1), c("white", "yellow", "red3")),row_km = 3, column_km = 3,layer_fun = function(j, i, x, y, width, height, fill, slice_r, slice_c) {v = pindex(similarity, i, j)#grid.text(sprintf("%.1f", v), x, y, gp = gpar(fontsize = 10))if(slice_r == slice_c) {grid.rect(gp = gpar(lwd = 4, fill = "transparent", col="black"))}})

在这里插入图片描述

Bug:

有一个问题：不同次执行，图竟然是不同的，不仅仅是分类的排列顺序问题，而是分类本身也不同了。搜了一下，竟然受到随机数种子的影响？！固定的数据，固定的参数，每次聚类为什么还要受到随机数影响？不理解！难道非监督的聚类还要人工判断对不对？

比如，对以上最后一个聚类函数，设置不同的随机数种子，结果分别是：

# set.seed(45) #这个随机数竟然影响分类位置？！比如修改随机数种子，图分别为
pheatmap(similarity,name = "Cosine\nsimilarity",main="xx", border_color = NA,clustering_method = "ward.D2",annotation_col = annotation_col, #set anno for columnannotation_colors = ann_colors, #set colors#col = c("white", "yellow", "red3"),#col = col_fun,col =  colorRamp2(c(0.8, 0.9, 1), c("white", "yellow", "red3")),row_km = 3, column_km = 3,layer_fun = function(j, i, x, y, width, height, fill, slice_r, slice_c) {v = pindex(similarity, i, j)#grid.text(sprintf("%.1f", v), x, y, gp = gpar(fontsize = 10))if(slice_r == slice_c) {grid.rect(gp = gpar(lwd = 4, fill = "transparent", col="black"))}})

在这里插入图片描述

原因：使用kmeans聚类，确实是随机数确定初始中心的。不使用kmeans聚类，就不会受到随机数的影响。

4. 层次聚类，对结果分群

原来：row_km = 3, column_km = 3, #kmeans确实是种子确定初始中心，结果会随随机数而变化
现在：cutree_row=3, cutree_cols=3, #层次聚类是稳定的

pheatmap(similarity,name = "Cosine\nsimilarity",main="Hierarchical cluster", border_color = NA,clustering_method = "ward.D2",annotation_col = annotation_col, #set anno for columnannotation_colors = ann_colors, #set colors#col = c("white", "yellow", "red3"),#col = col_fun,col =  colorRamp2(c(0.8, 0.9, 1), c("white", "yellow", "red3")),#row_km = 3, column_km = 3, #kmeans确实是种子确定初始中心cutree_row=3, cutree_cols=3, #层次聚类是稳定的layer_fun = function(j, i, x, y, width, height, fill, slice_r, slice_c) {v = pindex(similarity, i, j)#grid.text(sprintf("%.1f", v), x, y, gp = gpar(fontsize = 10))if(slice_r == slice_c) {grid.rect(gp = gpar(lwd = 4, fill = "transparent", col="black"))}})

在这里插入图片描述