从0开始学习R语言--Day53--AFT模型
在分析医疗数据时,尽管用cox回归可以分析一切因素对风险的影响,但是有时候因素的影响是非常直接的,比如对于癌症患者的生存风险,治疗手段(如化疗),会直接让肿瘤的生长速度减半,也就相当于延长了患者的生存时间,这个时候如果还要去用cox回归去分析单位时间内风险的概率,就有点本末倒置了,直接用AFT模型分析对生存时间的影响就能得到结果,毕竟在这个背景下,我们用析的最终目的是为了得知因素的效果。
以下是一个例子:
# 加载必要的包
library(survival)
library(ggplot2)# 1. 生成模拟数据集
set.seed(123)
n <- 200 # 样本量# 生成协变量
age <- rnorm(n, mean = 50, sd = 10)
treatment <- sample(0:1, n, replace = TRUE)
severity <- rnorm(n, mean = 5, sd = 2)# 生成生存时间(Weibull分布)
true_time <- exp(2 + 0.05*age - 0.8*treatment - 0.3*severity + rweibull(n, shape = 1.5, scale = 1))# 生成删失时间(随机删失)
censor_time <- runif(n, min = 0, max = max(true_time)*1.5)# 构造最终的时间和事件指示变量
time <- pmin(true_time, censor_time)
status <- as.numeric(true_time <= censor_time)# 创建数据框
surv_data <- data.frame(time = time,status = status,age = age,treatment = factor(treatment, labels = c("Control", "Treatment")),severity = severity
)# 查看前几行数据
head(surv_data)# 2. 拟合AFT模型(Weibull分布)
aft_weibull <- survreg(Surv(time, status) ~ age + treatment + severity, data = surv_data, dist = "weibull")
summary(aft_weibull)# 3. 拟合AFT模型(对数正态分布)
aft_lognormal <- survreg(Surv(time, status) ~ age + treatment + severity, data = surv_data, dist = "lognormal")
summary(aft_lognormal)# 4. 比较模型
AIC(aft_weibull, aft_lognormal)# 5. 解释结果(以Weibull模型为例)
# 正系数表示延长生存时间,负系数表示缩短生存时间
# 例如,treatmentTreatment的系数为0.72,表示治疗组比对照组的生存时间更长# 6. 预测生存时间
new_data <- data.frame(age = c(45, 45),treatment = factor(c("Control", "Treatment")),severity = c(4, 4)
)pred_time <- predict(aft_weibull, newdata = new_data, type = "response")
pred_time# 7. 可视化生存曲线
# 创建预测数据集
pred_grid <- expand.grid(age = seq(30, 70, by = 10),treatment = levels(surv_data$treatment),severity = mean(surv_data$severity)
)# 预测生存时间
pred_grid$pred_time <- predict(aft_weibull, newdata = pred_grid, type = "response")# 绘制预测生存时间
ggplot(pred_grid, aes(x = age, y = pred_time, color = treatment)) +geom_line(linewidth = 1) + # 将size改为linewidthlabs(title = "AFT模型预测生存时间",x = "年龄",y = "预测生存时间",color = "治疗组") +theme_minimal()
输出:
Call:
survreg(formula = Surv(time, status) ~ age + treatment + severity, data = surv_data, dist = "weibull")Value Std. Error z p
(Intercept) 4.28068 0.28790 14.87 < 2e-16
age 0.03353 0.00491 6.83 8.7e-12
treatmentTreatment -0.92151 0.09536 -9.66 < 2e-16
severity -0.34059 0.02463 -13.83 < 2e-16
Log(scale) -0.45295 0.04899 -9.25 < 2e-16Scale= 0.636 Weibull distribution
Loglik(model)= -828.1 Loglik(intercept only)= -936.9Chisq= 217.62 on 3 degrees of freedom, p= 6.6e-47
Number of Newton-Raphson Iterations: 6
n= 200
Call:
survreg(formula = Surv(time, status) ~ age + treatment + severity, data = surv_data, dist = "lognormal")Value Std. Error z p
(Intercept) 3.33903 0.24058 13.9 <2e-16
age 0.04375 0.00417 10.5 <2e-16
treatmentTreatment -0.79381 0.07826 -10.1 <2e-16
severity -0.32634 0.01969 -16.6 <2e-16
Log(scale) -0.61917 0.05181 -11.9 <2e-16Scale= 0.538 Log Normal distribution
Loglik(model)= -788 Loglik(intercept only)= -913.5Chisq= 251.07 on 3 degrees of freedom, p= 3.8e-54
Number of Newton-Raphson Iterations: 6
n= 200 df AIC
aft_weibull 5 1666.161
aft_lognormal 5 1585.9131 2
83.67537 33.29607
这里用了两个不同的分布来解析模型,相比weibull,对数分布对初期风险较低的情况比较敏感,即病发初期病毒被免疫系统遏制的时候,对数分布可以很好地预测风险,适用于刚发现就就诊的情况。