当前位置：首页 > article >正文

YOLOv5输入端（一）—— Mosaic数据增强|CSDN创作打卡

article 2025/9/12 13:18:38

入门小菜鸟，希望像做笔记记录自己学的东西，也希望能帮助到同样入门的人，更希望大佬们帮忙纠错啦~侵权立删。

一、原理分析

二、代码分析

1、主体部分——load_mosaic

2、load_image函数

3、random_perspective()函数（详见代码解析）

一、原理分析

YOLOv5采用和YOLOv4一样的Mosaic数据增强。

主要原理：它将一张选定的图片和随机的3张图片进行随机裁剪，再拼接到一张图上作为训练数据。

这样可以丰富图片的背景，而且四张图片拼接在一起变相提高了batch_size，在进行batch normalization（归一化）的时候也会计算四张图片。

这样让YOLOv5对本身batch_size不是很依赖。

二、代码分析

1、主体部分——load_mosaic

    labels4, segments4 = [], []s = self.img_size #获取图像尺寸yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y#random.uniform随机生成上述范围的实数（即一半图像尺寸到1.5倍图像尺寸）#这里是随机生成mosaic中心点

先初始化标注列表为空，然后获取图像尺寸s

根据图像尺寸利用random.uniform()随机生成mosaic中心点，范围在（即一半图像尺寸到1.5倍图像尺寸）

    indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices#随机生成另外3张图片的索引#random.choices——随机生成3个图片总数内的索引#然后一起把索引，连同原先选定的图片打包进indicesrandom.shuffle(indices)#对这些索引值随机排序

利用random.choices()随机生成另外3张图片的索引，将这4张图片的索引填进indices列表，然后利用random.shuffle()对这些索引值随机排序

for i, index in enumerate(indices): #循环遍历这些图片# Load imageimg, _, (h, w) = load_image(self, index)#加载图片和高宽

循环遍历这4张图片，并且调用load_image()函数加载图片和对应高宽

接下来就是如何放置这4张图啦~

        # place img in img4if i == 0:  # top left（左上角）img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles#先生成背景图x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)#设置大图上的位置（要么原图大小，要么放大）（w，h）或（xc，yc）（新生成的那张大图）x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)#选取小图上的位置（原图）

第一张图片放在左上角

img4首先用np.full()函数填充初始化大图，尺寸是4张图那么大

然后分别设置大图上该图片的位置，以及相应的在原图（即小图）上截取的位置坐标

        elif i == 1:  # top right（右上角）x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), ycx1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), helif i == 2:  # bottom left（左下角）x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)elif i == 3:  # bottom right（右下角）x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

剩下3张如法炮制

        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]#大图上贴上对应的小图

大图上贴上小图的对应部分

        padw = x1a - x1bpadh = y1a - y1b#计算小图到大图上时所产生的偏移，用来计算mosaic增强后的标签的位置

计算小图到大图上时所产生的偏移，用来计算mosaic增强后的标签的位置

        # Labelslabels, segments = self.labels[index].copy(), self.segments[index].copy()#获取标签if labels.size:labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format#将xywh（百分比那些值）标准化为像素xy格式segments = [xyn2xy(x, w, h, padw, padh) for x in segments]#转为像素段labels4.append(labels)segments4.extend(segments)#填进列表

对label标注进行初始化操作：

先读取对应图片的label，然后将xywh格式的label标准化为像素xy格式的。

segments转为像素段格式

然后统统填进之前准备的标注列表

    # Concat/clip labelslabels4 = np.concatenate(labels4, 0) #完成数组拼接for x in (labels4[:, 1:], *segments4):np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()#np.clip截取函数，固定值在0到2s内# img4, labels4 = replicate(img4, labels4)  # replicate

先把label列表进行数组拼接，转化好格式，方便下面的处理，并且把数据截取在0到2倍图片尺寸

    # Augment#进行mosaic的时候将四张图片整合到一起之后shape为[2*img_size,2*img_size]#对mosaic整合的图片进行随机旋转、平移、缩放、裁剪，并resize为输入大小img_sizeimg4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])img4, labels4 = random_perspective(img4, labels4, segments4,degrees=self.hyp['degrees'],translate=self.hyp['translate'],scale=self.hyp['scale'],shear=self.hyp['shear'],perspective=self.hyp['perspective'],border=self.mosaic_border)  # border to remove

进行mosaic的时候将四张图片整合到一起之后shape为[2*img_size,2*img_size]

并且对mosaic整合的图片进行随机旋转、平移、缩放、裁剪，并resize为输入大小img_size

    return img4, labels4

最后返回处理好的图片和相应的label

2、load_image函数

load_image函数：加载图片并根据设定的输入大小与图片原大小的比例ratio进行resize

首先获取该索引的图片

def load_image(self, i):#load_image加载图片并根据设定的输入大小与图片原大小的比例ratio进行resize# loads 1 image from dataset index 'i', returns im, original hw, resized hwim = self.imgs[i]#获取该索引的图片

判断一下图片是否有缓存，即有没有缩放处理过（这里不太确定这样理解对不对，如果错了麻烦在评论区跟我说一下下，谢谢啦~）

🎈如果没有：

先去对应文件夹中找

🌳如果能找到：加载这张图片

🌳如果找不到：读取这张图的路径，然后报错找不到对应路径的这张图片

读取这张图的原始高宽以及设定resize比例

如果这个比例不等于1，那我们就resize一下进行一个缩放

最后返回这张图片，原始高宽和缩放后的高宽

    if im is None:  # not cached in ram#图片如果没有缓存（就是还没有任何缩放处理过）npy = self.img_npy[i] #去文件夹中找if npy and npy.exists():  # load npyim = np.load(npy) #找到了我们就加载这张图片else:  # read imagepath = self.img_files[i] #找不到图片就读取原本这张图的路径im = cv2.imread(path)  # BGRassert im is not None, f'Image Not Found {path}' #报错找不到这张图h0, w0 = im.shape[:2]  # orig hw#读取这张图的原始高宽r = self.img_size / max(h0, w0)  # ratio #设定resize比例if r != 1:  # if sizes are not equalim = cv2.resize(im, (int(w0 * r), int(h0 * r)),interpolation=cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR)#实现缩放return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resized

🎈如果有

那就直接返回这张图片，原始高宽和缩放后的高宽啦~

    else:return self.imgs[i], self.img_hw0[i], self.img_hw[i]  # im, hw_original, hw_resized

3、random_perspective()函数（详见代码解析）

随机变换

计算方法：坐标向量和变换矩阵的乘积

首先获得加上边框后的图片高宽

def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,border=(0, 0)):# torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))# targets = [cls, xyxy]#图片高宽（加上border边框）height = im.shape[0] + border[0] * 2  # shape(h,w,c)width = im.shape[1] + border[1] * 2

然后计算中心点

    # CenterC = np.eye(3)#生成3*3的对角为1的对角矩阵#x方向的中心C[0, 2] = -im.shape[1] / 2  # x translation (pixels)#y方向的中心C[1, 2] = -im.shape[0] / 2  # y translation (pixels)

接下来是各种变换（旋转等等）的矩阵准备

    # Perspective#透视P = np.eye(3)#生成3*3的对角为1的对角矩阵#随机生成x，y方向上的透视值P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)# Rotation and Scale#旋转和缩放R = np.eye(3)#生成3*3的对角为1的对角矩阵a = random.uniform(-degrees, degrees)#随机生成范围内的角度# a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotationss = random.uniform(1 - scale, 1 + scale) #随机生成缩放比例# s = 2 ** random.uniform(-scale, scale)R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)#图片旋转得到仿射变化矩阵赋给R的前两行# Shear#弯曲角度S = np.eye(3)#生成3*3的对角为1的对角矩阵S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)# Translation#转换（放大缩小？)T = np.eye(3)T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)

然后是组合旋转矩阵

    # Combined rotation matrix#组合旋转矩阵M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT#通过矩阵乘法组合if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed#没有边框或者没有任何变换if perspective:#如果透视im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))#cv2.warpPerspective透视变换函数，可保持直线不变形，但是平行线可能不再平行else:  # affineim = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))#cv2.warpAffine放射变换函数，可实现旋转，平移，缩放，并且变换后的平行线依旧平行

然后是变换标签的坐标

    # Transform label coordinates#变换标签坐标n = len(targets)#目标个数if n:#如果有目标use_segments = any(x.any() for x in segments)#判断segments是否为空或是否全为0（目标像素段）new = np.zeros((n, 4))#初始化信息矩阵，每个目标4个信息xywhif use_segments:  # warp segments（变形segments）#如果不是空的segments = resample_segments(segments)  # upsample#上采样for i, segment in enumerate(segments):xy = np.ones((len(segment), 3))xy[:, :2] = segment#前两列是目标中心的像素段xy = xy @ M.T  # transform转化xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine#透视处理，重新缩放或者仿射#xy的最后一列全为1是为了和M.T矩阵相乘时，只会与最后M.T的最后一行相乘，而M.T的最后一行则为P当时设定的透视值# clip修建new[i] = segment2box(xy, width, height)else:  # warp boxes（变形box）xy = np.ones((n * 4, 3))xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1xy = xy @ M.T  # transformxy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine# create new boxesx = xy[:, [0, 2, 4, 6]]y = xy[:, [1, 3, 5, 7]]new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T# clip#去除进行上面一系列操作后被裁剪过小的框new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)

最后是计算候选框并返回

        # filter candidatesi = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)#计算候选框targets = targets[i]targets[:, 1:5] = new[i]return im, targets

欢迎大家在评论区批评指正，谢谢~

查看全文

http://www.lryc.cn/news/2415166.html