当前位置：首页 > news >正文

pytorch零基础实现语义分割项目（二）——标签转换与数据加载

news 2025/9/14 16:18:31

数据转换与加载

项目列表
前言
标签转换
- RGB标签到类别标签映射
- RGB标签转换成类别标签数据
数据加载
- 随机裁剪
- 数据加载

项目列表

语义分割项目（一）——数据概况及预处理

语义分割项目（二）——标签转换与数据加载

语义分割项目（三）——语义分割模型（U-net和deeplavb3+）

前言

在前面的文章中我们介绍了数据集的概况以及预处理，在训练之前除了数据预处理之外我们还需要对于标签进行处理，因为标签是以RGB格式存放的，我们需要把他们变换成常见的类别标签，并且因为语义分割问题是针对像素的分类，在数据量较大的情况下容易内存溢出（OOM），所以我们往往需要重写数据加载类针对大量数据进行加载。

标签转换

RGB标签到类别标签映射

我们知道RGB图像的数据点有三个通道，每个通道取值范围为 $0 - 255$
即 $（ 0 - 255 ， 0 - 255, 0 - 255 ）$ ，那么我们可以考虑这样一个思路，我们设置一个长度为 $255^3$ 的向量，这样就可以容纳所有像素的取值范围。在之前的文章中我们定义了VOC_COLORMAP和VOC_CLASSES，对应着像素形式的类别和文字形式的类别

VOC_COLORMAP = [[226, 169, 41], [132, 41, 246], [110, 193, 228], [60, 16, 152], [254, 221, 58], [155, 155, 155]]
VOC_CLASSES = ['Water', 'Land (unpaved area)', 'Road', 'Building', 'Vegetation', 'Unlabeled']

那么我们构造一个voc_colormap2label函数，通过enumerate遍历VOC_COLORMAP获取索引与像素类别，并赋值colormap2label

def voc_colormap2label():colormap2label = torch.zeros(256 ** 3, dtype=torch.long)for i, colormap in enumerate(VOC_COLORMAP):colormap2label[(colormap[0] * 256 + colormap[1]) * 256 + colormap[2]] = ireturn colormap2label

RGB标签转换成类别标签数据

通过上面的函数我们可以获得RGB标签到类别标签的映射关系，那么我们在构造一个函数，传入RGB标签数据colormap和RGB标签向类别标签的映射colormap2label，返回值是类别标签。

def voc_label_indices(colormap, colormap2label):colormap = colormap.permute(1, 2, 0).numpy().astype('int32')idx = ((colormap[:, :, 0] * 256 + colormap[:, :, 1]) * 256 + colormap[:, :, 2])return colormap2label[idx]

数据加载

随机裁剪

由于输入图像的形状不能确定，并且有时图像太大会影响训练速度或者影响内存，所以我们需要对于图像和标签进行裁剪，我们调用torchvision.transforms.RandomCrop.get_params可以获取随机裁剪的区域（这一步的操作是为了使得数据和标签的区域匹配），然后我们使用torchvision.transforms.functional.crop可以进行数据和标签同步裁剪。

def voc_rand_crop(feature, label, height, width):rect = torchvision.transforms.RandomCrop.get_params(feature, (height, width))feature = torchvision.transforms.functional.crop(feature, *rect)label = torchvision.transforms.functional.crop(label, *rect)return feature, label

数据加载

我们简单介绍一下数据加载类SemanticDataset

函数名	用途
`__init__`	用于初始参数设置
`normalize_image`	将图像设置成0-1范围内并进行normalize
`pad_params`	获取图像padding参数
`pad_image`	根据pad参数padding图像
`__getitem__`	通过索引获取数据
`__len__`	获取数据长度

数据加载类的主要的思路是加载图像和标签，对于图像进行规范化（除以255以及normalize），如果图像过大进行裁剪，如果图像过小进行padding，对于标签我们调用之前的函数从RGB标签转换成类别标签

class SemanticDataset(torch.utils.data.Dataset):def __init__(self, is_train, crop_size, data_dir):self.transform = torchvision.transforms.Normalize(mean=[0.4813, 0.4844, 0.4919], std=[0.2467, 0.2478, 0.2542])self.crop_size = crop_sizeself.data_dir = data_dirself.is_train = is_trainself.colormap2label = voc_colormap2label()txt_fname = os.path.join(data_dir, 'train.txt' if self.is_train else 'test.txt')with open(txt_fname, 'r') as f:self.images = f.read().split()def normalize_image(self, img):return self.transform(img.float() / 255)def pad_params(self, crop_h, crop_w, img_h, img_w):hight = max(crop_h, img_h)width = max(crop_w, img_w)y_s = (hight - img_h) // 2x_s = (width - img_w) // 2return hight, width, y_s, x_sdef pad_image(self, hight, width, y_s, x_s, feature):zeros = torch.zeros((feature.shape[0], hight, width))zeros[:, y_s:y_s + feature.shape[1], x_s:x_s + feature.shape[2]] = featurereturn zerosdef __getitem__(self, idx):mode = torchvision.io.image.ImageReadMode.RGBfeature = torchvision.io.read_image(os.path.join(self.data_dir, 'images', '{:03d}.jpg'.format(int(self.images[idx]))))label = torchvision.io.read_image(os.path.join(self.data_dir, 'labels', '{:03d}.png'.format(int(self.images[idx]))), mode)c_h, c_w, f_h, f_w = self.crop_size[0], self.crop_size[1], feature.shape[1], feature.shape[2]if f_h < c_h or f_w < c_w:higth, width, y_s, x_s = self.pad_params(c_h, c_w, f_h, f_w)feature = self.pad_image(higth, width, y_s, x_s, feature)label = self.pad_image(higth, width, y_s, x_s, label)feature = self.normalize_image(feature) feature, label = voc_rand_crop(feature, label,*self.crop_size)label = voc_label_indices(label, self.colormap2label)return (feature, label)def __len__(self):return len(self.images)

使用torch.utils.data.DataLoader批量加载数据

def load_data_voc(batch_size, crop_size, data_dir = './dataset'):train_iter = torch.utils.data.DataLoader(SemanticDataset(True, crop_size, data_dir), batch_size, shuffle=True, drop_last=True)test_iter = torch.utils.data.DataLoader(SemanticDataset(False, crop_size, data_dir), batch_size, shuffle=False, drop_last=True)return train_iter, test_iter