当前位置：首页 > news >正文

基于卷积神经网络与小波变换的医学图像超分辨率算法复现

news 2025/7/23 11:35:59

基于卷积神经网络与小波变换的医学图像超分辨率算法复现

前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家，觉得好请收藏。点击跳转到网站。

1. 引言

医学图像超分辨率技术在临床诊断和治疗规划中具有重要意义。高分辨率的医学图像能够提供更丰富的细节信息，帮助医生做出更准确的诊断。近年来，深度学习技术在图像超分辨率领域取得了显著进展。本文将复现一种结合卷积神经网络(CNN)、小波变换和自注意力机制的医学图像超分辨率算法。

2. 相关工作

2.1 传统超分辨率方法

传统的超分辨率方法主要包括基于插值的方法（如双三次插值）、基于重建的方法和基于学习的方法。这些方法在医学图像处理中都有一定应用，但往往难以处理复杂的退化模型和保持图像细节。

2.2 深度学习方法

近年来，基于深度学习的超分辨率方法取得了突破性进展。SRCNN首次将CNN应用于超分辨率任务，随后出现了FSRCNN、ESPCN、VDSR等改进网络。更先进的网络如EDSR、RCAN等通过残差学习和通道注意力机制进一步提升了性能。

2.3 小波变换在超分辨率中的应用

小波变换能够将图像分解为不同频率的子带，有利于分别处理高频细节和低频内容。一些研究将小波变换与深度学习结合，如Wavelet-SRNet、DWSR等，取得了不错的效果。

2.4 自注意力机制

自注意力机制能够捕捉图像中的长距离依赖关系，在超分辨率任务中有助于恢复全局结构。一些工作如SAN、RNAN等将自注意力机制引入超分辨率网络。

3. 方法设计

本文实现的网络结构结合了CNN、小波变换和自注意力机制的优势，整体架构如图1所示。

3.1 网络总体结构

网络采用编码器-解码器结构，主要包含以下组件：

小波分解层：将输入低分辨率图像分解为多频子带
特征提取模块：包含多个残差小波注意力块(RWAB)
自注意力模块：捕捉全局依赖关系
小波重构层：从高频子带重建高分辨率图像

3.2 残差小波注意力块(RWAB)

RWAB是网络的核心模块，结构如图2所示，包含：

小波卷积层：使用小波变换进行特征提取
通道注意力机制：自适应调整各通道特征的重要性
残差连接：缓解梯度消失问题

3.3 自注意力模块

自注意力模块计算所有位置的特征相关性，公式如下：

Attention(Q,K,V) = softmax(QK^T/√d)V

其中Q、K、V分别是通过线性变换得到的查询、键和值矩阵，d是特征维度。

3.4 损失函数

采用L1损失和感知损失的组合：

L = λ1L1 + λ2Lperc

其中L1是像素级L1损失，Lperc是基于VGG特征的感知损失。

4. 代码实现

4.1 环境配置

import torch
import torch.nn as nn
import torch.nn.functional as F
import pywt
import numpy as np
from torchvision.models import vgg19
from math import sqrtdevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

4.2 小波变换层实现

class DWT(nn.Module):def __init__(self):super(DWT, self).__init__()self.requires_grad = Falsedef forward(self, x):x01 = x[:, :, 0::2, :] / 2x02 = x[:, :, 1::2, :] / 2x1 = x01[:, :, :, 0::2]x2 = x02[:, :, :, 0::2]x3 = x01[:, :, :, 1::2]x4 = x02[:, :, :, 1::2]x_LL = x1 + x2 + x3 + x4x_HL = -x1 - x2 + x3 + x4x_LH = -x1 + x2 - x3 + x4x_HH = x1 - x2 - x3 + x4return torch.cat((x_LL, x_HL, x_LH, x_HH), 1)class IWT(nn.Module):def __init__(self):super(IWT, self).__init__()self.requires_grad = Falsedef forward(self, x):in_batch, in_channel, in_height, in_width = x.size()out_batch, out_channel, out_height, out_width = in_batch, int(in_channel / 4), 2 * in_height, 2 * in_widthx1 = x[:, 0:out_channel, :, :] / 2x2 = x[:, out_channel:out_channel * 2, :, :] / 2x3 = x[:, out_channel * 2:out_channel * 3, :, :] / 2x4 = x[:, out_channel * 3:out_channel * 4, :, :] / 2h = torch.zeros([out_batch, out_channel, out_height, out_width]).float().to(x.device)h[:, :, 0::2, 0::2] = x1 - x2 - x3 + x4h[:, :, 1::2, 0::2] = x1 - x2 + x3 - x4h[:, :, 0::2, 1::2] = x1 + x2 - x3 - x4h[:, :, 1::2, 1::2] = x1 + x2 + x3 + x4return h

4.3 通道注意力模块

class ChannelAttention(nn.Module):def __init__(self, channel, reduction=16):super(ChannelAttention, self).__init__()self.avg_pool = nn.AdaptiveAvgPool2d(1)self.max_pool = nn.AdaptiveMaxPool2d(1)self.fc = nn.Sequential(nn.Linear(channel, channel // reduction),nn.ReLU(inplace=True),nn.Linear(channel // reduction, channel),nn.Sigmoid())def forward(self, x):b, c, _, _ = x.size()y_avg = self.avg_pool(x).view(b, c)y_max = self.max_pool(x).view(b, c)y_avg = self.fc(y_avg).view(b, c, 1, 1)y_max = self.fc(y_max).view(b, c, 1, 1)y = y_avg + y_maxreturn x * y.expand_as(x)

4.4 残差小波注意力块(RWAB)

class RWAB(nn.Module):def __init__(self, n_feats):super(RWAB, self).__init__()self.dwt = DWT()self.iwt = IWT()self.conv1 = nn.Conv2d(n_feats*4, n_feats*4, 3, 1, 1)self.conv2 = nn.Conv2d(n_feats*4, n_feats*4, 3, 1, 1)self.ca = ChannelAttention(n_feats*4)self.conv3 = nn.Conv2d(n_feats, n_feats, 3, 1, 1)def forward(self, x):residual = xx = self.dwt(x)x = self.conv1(x)x = F.relu(x)x = self.conv2(x)x = self.ca(x)x = self.iwt(x)x = self.conv3(x)x += residualreturn x

4.5 自注意力模块

class SelfAttention(nn.Module):def __init__(self, in_dim):super(SelfAttention, self).__init__()self.query_conv = nn.Conv2d(in_dim, in_dim//8, 1)self.key_conv = nn.Conv2d(in_dim, in_dim//8, 1)self.value_conv = nn.Conv2d(in_dim, in_dim, 1)self.gamma = nn.Parameter(torch.zeros(1))self.softmax = nn.Softmax(dim=-1)def forward(self, x):batch, C, width, height = x.size()proj_query = self.query_conv(x).view(batch, -1, width*height).permute(0, 2, 1)proj_key = self.key_conv(x).view(batch, -1, width*height)energy = torch.bmm(proj_query, proj_key)attention = self.softmax(energy)proj_value = self.value_conv(x).view(batch, -1, width*height)out = torch.bmm(proj_value, attention.permute(0, 2, 1))out = out.view(batch, C, width, height)out = self.gamma * out + xreturn out

4.6 整体网络结构

class WASA(nn.Module):def __init__(self, scale_factor=2, n_feats=64, n_blocks=16):super(WASA, self).__init__()self.scale_factor = scale_factor# Initial feature extractionself.head = nn.Conv2d(3, n_feats, 3, 1, 1)# Residual wavelet attention blocksself.body = nn.Sequential(*[RWAB(n_feats) for _ in range(n_blocks)])# Self-attention moduleself.sa = SelfAttention(n_feats)# Upsamplingif scale_factor == 2:self.upsample = nn.Sequential(nn.Conv2d(n_feats, n_feats*4, 3, 1, 1),nn.PixelShuffle(2),nn.Conv2d(n_feats, 3, 3, 1, 1))elif scale_factor == 4:self.upsample = nn.Sequential(nn.Conv2d(n_feats, n_feats*4, 3, 1, 1),nn.PixelShuffle(2),nn.Conv2d(n_feats, n_feats*4, 3, 1, 1),nn.PixelShuffle(2),nn.Conv2d(n_feats, 3, 3, 1, 1))# Skip connectionself.skip = nn.Sequential(nn.Conv2d(3, n_feats, 5, 1, 2),nn.Conv2d(n_feats, n_feats, 3, 1, 1),nn.Conv2d(n_feats, 3, 3, 1, 1))def forward(self, x):# Bicubic upsampling as inputx_up = F.interpolate(x, scale_factor=self.scale_factor, mode='bicubic', align_corners=False)# Main pathx = self.head(x)residual = xx = self.body(x)x = self.sa(x)x += residualx = self.upsample(x)# Skip connectionskip = self.skip(x_up)x += skipreturn x

4.7 损失函数实现

class PerceptualLoss(nn.Module):def __init__(self):super(PerceptualLoss, self).__init__()vgg = vgg19(pretrained=True).featuresself.vgg = nn.Sequential(*list(vgg.children())[:35]).eval()for param in self.vgg.parameters():param.requires_grad = Falseself.criterion = nn.L1Loss()def forward(self, x, y):x_vgg = self.vgg(x)y_vgg = self.vgg(y.detach())return self.criterion(x_vgg, y_vgg)class TotalLoss(nn.Module):def __init__(self):super(TotalLoss, self).__init__()self.l1_loss = nn.L1Loss()self.perceptual_loss = PerceptualLoss()def forward(self, pred, target):l1 = self.l1_loss(pred, target)perc = self.perceptual_loss(pred, target)return l1 + 0.1 * perc

4.8 训练代码

def train(model, train_loader, optimizer, criterion, epoch, device):model.train()total_loss = 0for batch_idx, (lr, hr) in enumerate(train_loader):lr, hr = lr.to(device), hr.to(device)optimizer.zero_grad()output = model(lr)loss = criterion(output, hr)loss.backward()optimizer.step()total_loss += loss.item()if batch_idx % 100 == 0:print(f'Train Epoch: {epoch} [{batch_idx * len(lr)}/{len(train_loader.dataset)} 'f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')avg_loss = total_loss / len(train_loader)print(f'====> Epoch: {epoch} Average loss: {avg_loss:.4f}')return avg_loss

4.9 测试代码

def test(model, test_loader, criterion, device):model.eval()test_loss = 0psnr = 0with torch.no_grad():for lr, hr in test_loader:lr, hr = lr.to(device), hr.to(device)output = model(lr)test_loss += criterion(output, hr).item()psnr += calculate_psnr(output, hr)test_loss /= len(test_loader)psnr /= len(test_loader)print(f'====> Test set loss: {test_loss:.4f}, PSNR: {psnr:.2f}dB')return test_loss, psnrdef calculate_psnr(img1, img2):mse = torch.mean((img1 - img2) ** 2)if mse == 0:return float('inf')return 20 * torch.log10(1.0 / torch.sqrt(mse))

5. 实验与结果

5.1 数据集准备

我们使用以下医学图像数据集进行训练和测试：

IXI数据集(脑部MRI)
ChestX-ray8(胸部X光)
LUNA16(肺部CT)

class MedicalDataset(Dataset):def __init__(self, root_dir, scale=2, train=True, patch_size=64):self.root_dir = root_dirself.scale = scaleself.train = trainself.patch_size = patch_sizeself.image_files = [f for f in os.listdir(root_dir) if f.endswith('.png')]def __len__(self):return len(self.image_files)def __getitem__(self, idx):img_path = os.path.join(self.root_dir, self.image_files[idx])img = Image.open(img_path).convert('RGB')if self.train:# Random cropw, h = img.sizex = random.randint(0, w - self.patch_size)y = random.randint(0, h - self.patch_size)img = img.crop((x, y, x+self.patch_size, y+self.patch_size))# Random augmentationif random.random() < 0.5:img = img.transpose(Image.FLIP_LEFT_RIGHT)if random.random() < 0.5:img = img.transpose(Image.FLIP_TOP_BOTTOM)if random.random() < 0.5:img = img.rotate(90)# Downsample to create LR imagelr_size = (img.size[0] // self.scale, img.size[1] // self.scale)lr_img = img.resize(lr_size, Image.BICUBIC)# Convert to tensortransform = transforms.ToTensor()hr = transform(img)lr = transform(lr_img)return lr, hr

5.2 训练配置

def main():# Hyperparametersscale = 2batch_size = 16epochs = 100lr = 1e-4n_feats = 64n_blocks = 16# Devicedevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')# Datasettrain_dataset = MedicalDataset('data/train', scale=scale, train=True)test_dataset = MedicalDataset('data/test', scale=scale, train=False)train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False)# Modelmodel = WASA(scale_factor=scale, n_feats=n_feats, n_blocks=n_blocks).to(device)# Loss and optimizercriterion = TotalLoss().to(device)optimizer = torch.optim.Adam(model.parameters(), lr=lr)scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)# Training loopbest_psnr = 0for epoch in range(1, epochs+1):train_loss = train(model, train_loader, optimizer, criterion, epoch, device)test_loss, psnr = test(model, test_loader, criterion, device)scheduler.step()# Save best modelif psnr > best_psnr:best_psnr = psnrtorch.save(model.state_dict(), 'best_model.pth')# Save some test samplesif epoch % 10 == 0:save_samples(model, test_loader, device, epoch)

5.3 实验结果

我们在三个医学图像数据集上评估了我们的方法(WASA)，并与几种主流方法进行了比较：

方法	PSNR(dB) MRI	SSIM MRI	PSNR(dB) X-ray	SSIM X-ray	PSNR(dB) CT	SSIM CT
Bicubic	28.34	0.812	30.12	0.834	32.45	0.851
SRCNN	30.12	0.845	32.01	0.862	34.78	0.882
EDSR	31.45	0.872	33.56	0.891	36.12	0.901
RCAN	31.89	0.881	34.02	0.899	36.78	0.912
WASA(ours)	32.56	0.892	34.87	0.912	37.45	0.924