从0开始学习R语言--Day36--空间杜宾模型
观察离散变量时,用马尔科夫链可以有效突出其峰值以及细节变化,但对于连续变量,我们更需要考虑的是变量间的关系,而不是其自身状态变化。且如果数据中没有时序条件,是无法应用马尔科夫链的。
相比之下,杜宾模型不仅能够解释变量间的直接关系,更能够发现间接关系,从而抓住容易忽视的细节。分析区域的经济变化时,比如,也许你只能知道周围区域的经济都呈上升态势,以为自身区域主要是受他们影响,但有可能忽略当自己区域的经济政策变化时,对周围区域的影响,从而错误预估周围区域的经济态势。
以下是一个例子:
# 加载必要的包
library(spdep)
library(spatialreg)
library(sf)# 设置随机种子保证结果可重复
set.seed(123)# 正确创建空间点数据 - 25个空间单元(5x5网格)
n <- 25
coords <- expand.grid(x = 1:5, y = 1:5) # 创建坐标点# 方法1:正确创建sf点对象(推荐)
grid <- st_as_sf(coords, coords = c("x", "y"), remove = FALSE)# 检查几何列
st_geometry(grid) # 应该显示"POINT"几何类型# 创建空间权重矩阵(基于Delaunay三角邻接)
nb <- tri2nb(st_coordinates(grid))
W <- nb2listw(nb, style = "W")# 生成解释变量和误差项
X1 <- rnorm(n) # 第一个解释变量
X2 <- rnorm(n) # 第二个解释变量
epsilon <- rnorm(n, sd = 0.5) # 误差项# 设置真实参数值
rho <- 0.4 # 空间自相关系数
beta1 <- 0.8 # X1的系数
beta2 <- -0.5 # X2的系数
theta1 <- 0.3 # X1的空间滞后系数
theta2 <- 0.2 # X2的空间滞后系数# 生成因变量Y (使用空间杜宾模型生成过程)
I_n <- diag(n)
A <- solve(I_n - rho * listw2mat(W))
Y <- A %*% (beta1 * X1 + beta2 * X2 + theta1 * lag.listw(W, X1) + theta2 * lag.listw(W, X2) + epsilon)# 合并为数据框(正确方式)
sp_data <- st_sf(data.frame(id = 1:n,Y = as.numeric(Y),X1 = X1,X2 = X2),geometry = st_geometry(grid)
)# 查看前几行数据(确认包含几何列)
head(sp_data)# 估计空间杜宾模型
sdm_model <- lagsarlm(Y ~ X1 + X2, data = sp_data, listw = W,type = "mixed") # "mixed"表示SDM模型# 查看模型摘要
summary(sdm_model)# 计算直接和间接效应
impacts <- impacts(sdm_model, listw = W, R = 200) # R为bootstrap次数
summary(impacts, zstats = TRUE)
输出:
Geometry set for 25 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 1 ymin: 1 xmax: 5 ymax: 5
CRS: NA
First 5 geometries:
POINT (1 1)
POINT (2 1)
POINT (3 1)
POINT (4 1)
POINT (5 1) Impact measures (mixed, exact):Direct Indirect Total
X1 0.9500444 0.8176235 1.7676679
X2 -0.3992945 0.2101624 -0.1891321
========================================================
Simulation results ( variance matrix):
Direct:Iterations = 1:200
Thinning interval = 1
Number of chains = 1
Sample size per chain = 200 1. Empirical mean and standard deviation for each variable,plus standard error of the mean:Mean SD Naive SE Time-series SE
X1 0.9467 0.1002 0.007085 0.006366
X2 -0.4029 0.1031 0.007289 0.0072892. Quantiles for each variable:2.5% 25% 50% 75% 97.5%
X1 0.7496 0.8815 0.9472 1.0229 1.1245
X2 -0.5998 -0.4747 -0.4061 -0.3354 -0.1939========================================================
Indirect:Iterations = 1:200
Thinning interval = 1
Number of chains = 1
Sample size per chain = 200 1. Empirical mean and standard deviation for each variable,plus standard error of the mean:Mean SD Naive SE Time-series SE
X1 0.8397 0.2015 0.01425 0.01425
X2 0.2149 0.1931 0.01365 0.013652. Quantiles for each variable:2.5% 25% 50% 75% 97.5%
X1 0.5173 0.70731 0.8237 0.9362 1.2544
X2 -0.1290 0.09208 0.1951 0.3412 0.6312========================================================
Total:Iterations = 1:200
Thinning interval = 1
Number of chains = 1
Sample size per chain = 200 1. Empirical mean and standard deviation for each variable,plus standard error of the mean:Mean SD Naive SE Time-series SE
X1 1.786 0.2065 0.01460 0.01460
X2 -0.188 0.2293 0.01621 0.016212. Quantiles for each variable:2.5% 25% 50% 75% 97.5%
X1 1.4662 1.6390 1.7661 1.90331 2.1878
X2 -0.5926 -0.3328 -0.2127 -0.04727 0.2966========================================================
Simulated standard errorsDirect Indirect Total
X1 0.1001948 0.2014628 0.2064584
X2 0.1030838 0.1931103 0.2292857Simulated z-values:Direct Indirect Total
X1 9.448649 4.167854 8.652459
X2 -3.908593 1.112642 -0.820155Simulated p-values:Direct Indirect Total
X1 < 2.22e-16 3.0748e-05 < 2e-16
X2 9.2835e-05 0.26586 0.41213
从输出中可以看到,X1,X2的p值小于0.001,说明其结果显著性较高,在直接效应上呈线性关系,而X1的间接效应的p也是小于0.001的,说明X1的空间溢出对Y有增益(值为正即为增益),而X2的大于0.001,所以不纳入考虑,整体来看,X1对于Y不管是直接还是间接影响都较大,,而X2的整体p大于0.001,说明其总效应不显著。