当前位置：首页 > news >正文

物体检测-系列教程16：YOLOV5 源码解析6（马赛克数据增强函数load_mosaic）

news 2025/9/14 20:09:29

😎😎😎物体检测-系列教程总目录

有任何问题欢迎在下面留言
本篇文章的代码运行界面均在Pycharm中进行
本篇文章配套的代码资源已经上传
点我下载源码

9、load_mosaic函数

Mosaic（马赛克）数据增强：将四张不同的图像拼接成一张大图像来增加场景的复杂性和多样性

9.1 load_mosaic函数

def load_mosaic(self, index):labels4, segments4 = [], []s = self.img_sizeyc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border]  # mosaic center x, yindices = [index] + random.choices(self.indices, k=3)  # 3 additional image indicesfor i, index in enumerate(indices):img, _, (h, w) = load_image(self, index)if i == 0:  # top leftimg4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, ycx1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, helif i == 1:  # top rightx1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), ycx1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), helif i == 2:  # bottom leftx1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)elif i == 3:  # bottom rightx1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]padw = x1a - x1bpadh = y1a - y1blabels, segments = self.labels[index].copy(), self.segments[index].copy()if labels.size:labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)segments = [xyn2xy(x, w, h, padw, padh) for x in segments]labels4.append(labels)segments4.extend(segments)labels4 = np.concatenate(labels4, 0)for x in (labels4[:, 1:], *segments4):np.clip(x, 0, 2 * s, out=x)img4, labels4 = random_perspective(img4, labels4, segments4,degrees=self.hyp['degrees'],translate=self.hyp['translate'],scale=self.hyp['scale'],shear=self.hyp['shear'],perspective=self.hyp['perspective'],border=self.mosaic_border)return img4, labels4

定义函数，接受索引为参数
labels4, segments4，存储拼接后图像的标签和分割信息
s，获取单张图像的目标大小
yc, xc，计算马赛克图像中心点的坐标，但是这个中心点坐标是在一个确定的范围内随机产生的，4张图像可能会相互覆盖，超出边界的会进行裁剪
indices ，随机选择另外三个图像的索引，组成一个列表indices
现在indices 是一个包含4个图像索引的list，遍历这个list

依次遍历计算4张图像的位置坐标和裁剪的区域，构建大图像：（初始化一个大图，计算当前小图像放在大图中什么位置，计算当前小图像取哪一部分放在大图中，可能有些图像大小不足以放到哪个区域就用114填充，如果图像和标签越界了，越界的图像就不要了，越界的框也要修正一下）

img, _, (h, w)，通过当前遍历的索引使用load_image函数加载图像，返回加载后的图像与长宽
如果是第1张图像，即top left左上角：
创建一个大小为(s * 2, s * 2)，通道数与img相同，所有像素值全部为114的大图像
计算第1张图像在马赛克图像中的位置坐标
计算需要从第1张图像中裁剪的区域
如果是第2张图像，即top right右上角：
计算第2张图像在马赛克图像中的位置坐标
计算需要从第2张图像中裁剪的区域
如果是第3张图像，即bottom left左下角：
计算第3张图像在马赛克图像中的位置坐标
计算需要从第3张图像中裁剪的区域
如果是第4张图像，即bottom right右下角：
计算第4张图像在马赛克图像中的位置坐标
计算需要从第4张图像中裁剪的区域
将当前图像进行裁剪后放回大图像中
padw ，计算水平方向上的填充量
padh ，计算垂直方向上的填充量
复制当前图像索引对应的标签和分割信息
如果当前图像有标签：
将标签从归一化的xywh格式使用xywhn2xyxy函数转换为像素级的xyxy格式，并考虑填充调整
对分割信息使用xyn2xy函数进行同样的转换和调整
将当前图像的标签添加到labels4列表中
将当前图像的分割信息添加到segments4列表中
labels4 ，将所有图像的标签合并成一个ndarray
遍历所有标签和分割信息的坐标，准备进行裁剪
使用np.clip函数限制坐标值不超出马赛克图像的范围

做完大图后，可以再对大图进行一些数据增强操作（这里使用的是辅助函数），也有先对小图像进行数据增强后再拼成大图像

对马赛克图像及其标签使用random_perspective函数应用随机透视变换，以进行进一步的数据增强
返回马赛克图像和对应的标签

9.2 load_image函数

def load_image(self, index):# loads 1 image from dataset, returns img, original hw, resized hwimg = self.imgs[index]if img is None:  # not cachedpath = self.img_files[index]img = cv2.imread(path)  # BGRassert img is not None, 'Image Not Found ' + pathh0, w0 = img.shape[:2]  # orig hwr = self.img_size / max(h0, w0)  # resize image to img_sizeif r != 1:  # always resize down, only resize up if training with augmentationinterp = cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEARimg = cv2.resize(img, (int(w0 * r), int(h0 * r)), interpolation=interp)return img, (h0, w0), img.shape[:2]  # img, hw_original, hw_resizedelse:return self.imgs[index], self.img_hw0[index], self.img_hw[index]  # img, hw_original, hw_resized

9.3 xywhn2xyxy函数

def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):# Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-righty = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)y[:, 0] = w * (x[:, 0] - x[:, 2] / 2) + padw  # top left xy[:, 1] = h * (x[:, 1] - x[:, 3] / 2) + padh  # top left yy[:, 2] = w * (x[:, 0] + x[:, 2] / 2) + padw  # bottom right xy[:, 3] = h * (x[:, 1] + x[:, 3] / 2) + padh  # bottom right yreturn y

9.4 xywhn2xyxy函数

def xyn2xy(x, w=640, h=640, padw=0, padh=0):# Convert normalized segments into pixel segments, shape (n,2)y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)y[:, 0] = w * x[:, 0] + padw  # top left xy[:, 1] = h * x[:, 1] + padh  # top left yreturn y

9.5 random_perspective函数

def random_perspective(img, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,border=(0, 0)):height = img.shape[0] + border[0] * 2  # shape(h,w,c)width = img.shape[1] + border[1] * 2C = np.eye(3)C[0, 2] = -img.shape[1] / 2  # x translation (pixels)C[1, 2] = -img.shape[0] / 2  # y translation (pixels)P = np.eye(3)P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)R = np.eye(3)a = random.uniform(-degrees, degrees)s = random.uniform(1 - scale, 1 + scale)R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)S = np.eye(3)S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)T = np.eye(3)T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANTif (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changedif perspective:img = cv2.warpPerspective(img, M, dsize=(width, height), borderValue=(114, 114, 114))else:  # affineimg = cv2.warpAffine(img, M[:2], dsize=(width, height), borderValue=(114, 114, 114))n = len(targets)if n:use_segments = any(x.any() for x in segments)new = np.zeros((n, 4))if use_segments:  # warp segmentssegments = resample_segments(segments)  # upsamplefor i, segment in enumerate(segments):xy = np.ones((len(segment), 3))xy[:, :2] = segmentxy = xy @ M.T  # transformxy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affinenew[i] = segment2box(xy, width, height)else:  # warp boxesxy = np.ones((n * 4, 3))xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1xy = xy @ M.T  # transformxy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affinex = xy[:, [0, 2, 4, 6]]y = xy[:, [1, 3, 5, 7]]new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).Tnew[:, [0, 2]] = new[:, [0, 2]].clip(0, width)new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)targets = targets[i]targets[:, 1:5] = new[i]return img, targets