Yolo v7的最简TensorFlow实现
Yolo v7去年推出之后,取得了很好的性能。作者也公布了基于Pytorch实现的源代码。在我之前的几篇博客当中,对代码进行了深入的解析,了解了Yolo v7的技术细节和实现机制。因为我一直是用的Tensorflow,因此也想尝试把代码移植到Tensorflow上。
数据集的构建
直接运行Yolo v7源代码里面的get_coco.sh脚本下载coco数据集,脚本代码如下:
#!/bin/bash
# COCO 2017 dataset http://cocodataset.org
# Download command: bash ./scripts/get_coco.sh# Download/unzip labels
d='./' # unzip directory
url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
f='coco2017labels-segments.zip' # or 'coco2017labels.zip', 68 MB
echo 'Downloading' $url$f ' ...'
curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background# Download/unzip images
d='./coco/images' # unzip directory
url=http://images.cocodataset.org/zips/
f1='train2017.zip' # 19G, 118k images
f2='val2017.zip' # 1G, 5k images
f3='test2017.zip' # 7G, 41k images (optional)
for f in $f1 $f2 $f3; doecho 'Downloading' $url$f '...'curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
done
wait # finish background tasks
数据下载完成之后,在images和labels目录下分别有train2017, val2017, test2017这三个子目录,对应训练/验证/测试数据。
然后我们可以基于Tensorflow来构建一个训练的数据集,需要对训练的图像进行增强,包括了包括了Mosaic拼接,随机拷贝图像,随机形变,色彩调整等,相应的图像里面的物体Label也要做相应的变换。具体的工作原理可以见我之前的博客,解读YOLO v7的代码(二)训练数据的准备-CSDN博客
这里我定义了一个Dataloader的类,负责对训练集的数据进行相应的图像增强处理,这里的处理过程和Yolov7源码的基本是一致的,只是做了一些小的修改,就是当做了Mosaic拼接之后,如果随机形变是进行缩小,那么有可能会出现物体的检测框超出图像的情况,这里我根据物体的segments数据进行了裁减,使得不会超出图像。
对于验证集的数据,我们不需要进行图像增强,只需要对图像的长边缩放到640即可,空白部分进行padding。Tensorflow的dataset的定义如下:
def map_val_fn(t: tf.Tensor):filename = str(t.numpy(), encoding='utf-8')imgid = int(filename[20:32])# Load imageimg, (h0, w0), (h, w) = load_image(filename)#augment_hsv(img, hgain=hsv_h, sgain=hsv_s, vgain=hsv_v)# Labelslabel_filename = val_label_path + filename.split('/')[-1].split('.')[0] + '.txt'labels, _ = load_labels(label_filename)labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, 0, 0) # normalized xywh to pixel xyxy formatlabels[:, 1:5] = xyxy2xywh(labels[:, 1:5]) # convert xyxy to xywhlabels[:, 1:5] /= img_size # normalized height 0-1img = img[:, :, ::-1].transpose(2,0,1)img = img/255.img_hw = tf.concat([h0, w0], axis=0)return img, labels, img_hw, imgiddataset_val = tf.data.Dataset.list_files("coco/images/val2017/*.jpg", shuffle=False)
dataset_val = dataset_val.map(lambda x: tf.py_function(func=map_val_fn, inp=[x], Tout=[tf.float32, tf.float32, tf.int32, tf.int32]), num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset_val = dataset_val\\.padded_batch(val_batch_size, padded_shapes=([3, img_size, img_size], [None, 5], [2], []), padding_values=(144/255., 0., 0, 0))\\.prefetch(tf.data.experimental.AUTOTUNE)
对于训练集的dataset,本来我也是打算按类似以上验证集的方式来定义,只是把map函数替换为对应的Dataloader里面的函数,具体代码可以见dataloader.py。但是我发现这种方式效率不高,在实际测试中发现,因为这个图像增强的过程比较复杂,CPU需要花费较多的事件处理,虽然Tensorflow dataset的map和prefetch提供了一个Autotune的参数可以进行并行处理的优化,但是效果不是太理想,还是出现GPU等待CPU处理完数据的情况。为此我自己写了一个并行处理的函数,利用Python multiprocessing的多进程函数,来对图像进行并行处理,当GPU在训练100个Batch的时候,CPU并行准备下100个Batch的训练数据,这样可以大幅提高性能。
具体做法是创建一个share memory给各个子进程共享,然后在训练集的图像中随机抽取一部分文件名,分配给几个子进程,每个子进程读取这些图像,进行相应的图像处理,以及对相应的图像Label文件进行处理,并把处理后的数据写入到Share memory的对应位置。最后有一个独立的子进程对Share memory的数据进行合并整理,然后就可以基于整理后的数据直接构建一个dataset了。
相关的代码如下:
#对传入的图像ID进行增强处理,并把结果写入到共享内存
def augment_data(imgids, datasize, memory_name, offset, q):dataset = Dataloader(img_size, train_image_dir, train_label_dir, imgids, hyp)traindata = dataset.generateTrainData(datasize)traindata_obj = pickle.dumps(traindata, protocol=pickle.HIGHEST_PROTOCOL)existing_shm = shared_memory.SharedMemory(name=memory_name)existing_shm.buf[offset:offset+len(traindata_obj)] = traindata_obj q.put((offset, offset+len(traindata_obj)))existing_shm.close()#对图像处理子进程的结果进行合并
def merge_subprocess(q, subprocess_num, memory_name):results = []while(True):msg = q.get()if msg is not None:results.append(msg)if len(results)>=subprocess_num:breakelse:time.sleep(1)existing_shm = shared_memory.SharedMemory(name=memory_name)merge_data = []for result in results:merge_data.extend(pickle.loads(existing_shm.buf[result[0]:result[1]]))merge_data_obj = pickle.dumps(merge_data, protocol=pickle.HIGHEST_PROTOCOL)existing_shm.buf[:len(merge_data_obj)] = merge_data_objexisting_shm.close()q.put(len(merge_data_obj))#启动多个子进程进行图像增强处理,并对结果进行汇总整理
def prepare_traindata(memory_name):sample_imgid = sample(imgid_train, sample_len) #随机选取一部分训练集图像的文件名subprocess_list = []for i in range(subprocess_num): #启动多个子进程,分别对图像和Label进行处理subprocess_list.append(mp.Process(target=augment_data, args=(sample_imgid[i*imgid_num_process:(i+1)*imgid_num_process], data_size//subprocess_num, memory_name, i*shared_memory_size//subprocess_num, q, )))for p in subprocess_list:p.start()#启动子进程对处理结果进行汇总整理p0 = mp.Process(target=merge_subprocess, args=(q, subprocess_num, memory_name,))p0.start()return p0image_cache = shared_memory.SharedMemory(name="dataset", create=True, size=shared_memory_size) #创建共享内存merge_proc = prepare_traindata("dataset")#等待汇总子进程执行完毕,从Queue中获取数据size,并进行反序列化
merge_proc.join()
msg = q.get()
if msg>0:traindata = pickle.loads(image_cache.buf[:msg])
else:print("Could not load training data.")image_cache.close()image_cache.unlink()image_cache.close()
image_cache.unlink()def traindata_gen():global traindatai = 0while i<len(traindata):yield traindata[i][0]/255., traindata[i][1]i += 1#构建dataset
dataset = tf.data.Dataset.from_generator(traindata_gen,output_types=(tf.float32, tf.float32), output_shapes=((3, img_size, img_size), (None, 5)))
dataset = dataset.padded_batch(batch_size, padded_shapes=([3, img_size, img_size], [None, 5]))
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
模型的定义
构建一个YOLO v7的模型,模型的结构解读可见我之前的另一篇博客解读YOLO v7的代码(一)模型结构研究_gzroy的博客-CSDN博客
定义一个yolo.py文件,里面定义了模型的自定义层和对模型进行组装。
import tensorflow as tf
from tensorflow import keras
l=tf.keras.layers
from params import *@tf.keras.utils.register_keras_serializable()
class YoloConv(keras.layers.Layer):def __init__(self, filters, kernel_size, strides, padding='same', bias=False, activation='swish', kwargs):super(YoloConv, self).__init__(kwargs)self.activation = activationself.filters = filtersself.kernel_size = kernel_sizeself.strides = stridesself.padding = paddingself.bias = biasself.cv = l.Conv2D(filters=self.filters, kernel_size=self.kernel_size, strides=self.strides,padding=self.padding,data_format='channels_first',use_bias=self.bias,kernel_initializer='he_normal',kernel_regularizer=tf.keras.regularizers.l2(l=weight_decay))self.bn = l.BatchNormalization(axis=1)self.swish = l.Activation('swish')def call(self, inputs, training):output = self.cv(inputs)output = self.bn(output, training)if self.activation=='swish':output = self.swish(output)else:output = outputreturn outputdef get_config(self):config = super(YoloConv, self).get_config()config.update({"activation": self.activation,"filters": self.filters,"kernel_size": self.kernel_size,"strides": self.strides,"padding": self.padding,"bias": self.bias})return config@tf.keras.utils.register_keras_serializable()
class Elan(keras.layers.Layer):def __init__(self, filters, kwargs):super(Elan, self).__init__(kwargs)self.filters = filtersself.cv1 = YoloConv(self.filters, 1, 1)self.cv2 = YoloConv(self.filters, 1, 1)self.cv3 = YoloConv(self.filters, 3, 1)self.cv4 = YoloConv(self.filters, 3, 1)self.cv5 = YoloConv(self.filters, 3, 1)self.cv6 = YoloConv(self.filters, 3, 1)self.cv7 = YoloConv(self.filters*4, 1, 1)self.concat = l.Concatenate(axis=1)def call(self, inputs, training):output1 = self.cv1(inputs, training)output2 = self.cv2(inputs, training)output3 = self.cv4(self.cv3(output2, training), training)output4 = self.cv6(self.cv5(output3, training), training)output = self.concat([output1, output2, output3, output4])output = self.cv7(output, training)return outputdef get_config(self):config = super(Elan, self).get_config()config.update({"filters": self.filters})return config@tf.keras.utils.register_keras_serializable()
class MP(keras.layers.Layer):def __init__(self, filters, k=2):super(MP, self).__init__()self.filters = filtersself.k = kself.cv1 = YoloConv(filters, 1, 1)self.cv2 = YoloConv(filters, 1, 1)self.cv3 = YoloConv(filters, 3, 2)self.pool = l.MaxPool2D(pool_size=self.k, strides=self.k, padding='same', data_format='channels_first')self.concat = l.Concatenate(axis=1)def call(self, inputs, training):output1 = self.pool(inputs)output1 = self.cv1(output1, training)output2 = self.cv2(inputs, training)output2 = self.cv3(output2, training)output = self.concat([output1, output2])return outputdef get_config(self):config = super(MP, self).get_config()config.update({"filters": self.filters,"k": self.k})return config@tf.keras.utils.register_keras_serializable()
class SPPCSPC(keras.layers.Layer):def __init__(self, filters, e=0.5, k=(5,9,13)):super(SPPCSPC, self).__init__()self.filters = filtersself.e = eself.k = kc_ = int(2 * self.filters * self.e)self.cv1 = YoloConv(c_, 1, 1)self.cv2 = YoloConv(c_, 1, 1)self.cv3 = YoloConv(c_, 3, 1)self.cv4 = YoloConv(c_, 1, 1)self.m = [l.MaxPool2D(pool_size=x, strides=1, padding='same', data_format='channels_first') for x in k]self.cv5 = YoloConv(c_, 1, 1)self.cv6 = YoloConv(c_, 3, 1)self.cv7 = YoloConv(filters, 1, 1)self.concat = l.Concatenate(axis=1)def call(self, inputs, training):output1 = self.cv4(self.cv3(self.cv1(inputs, training), training), training)output2 = self.concat([output1] + [m(output1) for m in self.m])output2 = self.cv6(self.cv5(output2, training), training)output3 = self.cv2(inputs, training)output = self.cv7(self.concat([output2, output3]), training)return outputdef get_config(self):config = super(SPPCSPC, self).get_config()config.update({"filters": self.filters,"k": self.k,"e": self.e})return config@tf.keras.utils.register_keras_serializable()
class Elan_A(keras.layers.Layer):def __init__(self, filters):super(Elan_A, self).__init__()self.filters = filtersself.cv1 = YoloConv(filters, 1, 1)self.cv2 = YoloConv(filters, 1, 1)self.cv3 = YoloConv(filters//2, 3, 1)self.cv4 = YoloConv(filters//2, 3, 1)self.cv5 = YoloConv(filters//2, 3, 1)self.cv6 = YoloConv(filters//2, 3, 1)self.cv7 = YoloConv(filters, 1, 1)self.concat = l.Concatenate(axis=1)def call(self, inputs, training):output1 = self.cv1(inputs, training)output2 = self.cv2(inputs, training)output3 = self.cv3(output2, training)output4 = self.cv4(output3, training)output5 = self.cv5(output4, training)output6 = self.cv6(output5, training)output7 = self.concat([output1, output2, output3, output4, output5, output6])output = self.cv7(output7, training)return outputdef get_config(self):config = super(Elan_A, self).get_config()config.update({"filters": self.filters,})return config@tf.keras.utils.register_keras_serializable()
class RepConv(keras.layers.Layer):def __init__(self, filters):super(RepConv, self).__init__()self.filters = filtersself.cv1 = YoloConv(filters, 3, 1, activation=None)self.cv2 = YoloConv(filters, 1, 1, activation=None)self.swish = l.Activation('swish')def call(self, inputs, training):output1 = self.cv1(inputs, training)output2 = self.cv2(inputs, training)output = self.swish(output1+output2)return outputdef get_config(self):config = super(RepConv, self).get_config()config.update({"filters": self.filters,})return config@tf.keras.utils.register_keras_serializable()
class IDetect(keras.layers.Layer):def __init__(self, shape, no, na, grids):super(IDetect, self).__init__()#self.a = tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16)self.a = tf.Variable(tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))self.m = tf.Variable(tf.random.normal((1,no*na,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))#self.a = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,shape,1,1))#self.m = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,no*na,1,1))self.cv = YoloConv(no*na, 1, 1, bias=True, activation=None)self.shape = shapeself.no = noself.na = naself.grids = gridsself.reshape = l.Reshape([self.na, self.no, self.grids*self.grids])#self.permute = l.Permute([1,3,4,2])self.permute = l.Permute([1,3,2])self.activation = l.Activation('linear', dtype='float32')def call(self, inputs, training):#output = l.Add()([inputs, self.a])output = inputs + self.aoutput = self.cv(output, training)output = self.m * output#output = self.cv(inputs)#output = tf.reshape(output, [-1, self.na, self.no, self.grids, self.grids])output = self.reshape(output)#output = tf.transpose(output, perm=[0,1,3,4,2])output = self.permute(output)output = self.activation(output)return outputdef get_config(self):config = super(IDetect, self).get_config()config.update({"no": self.no,"na": self.na,"grids": self.grids,"shape": self.shape})return configdef create_model():inputs = keras.Input(shape=(3, img_size, img_size))x = YoloConv(32, 3, 1)(inputs) #[32, img_size, img_size]x = YoloConv(64, 3, 2)(x) #[64, img_size/2, img_size/2]x = YoloConv(64, 3, 1)(x) #[64, img_size/2, img_size/2]x = YoloConv(128, 3, 2)(x) #[128, img_size/4, img_size/4]x = Elan(64)(x) #11x = MP(128)(x) #16route1 = Elan(128)(x) #24x = MP(256)(route1) #29route2 = Elan(256)(x) #37x = MP(512)(route2) #42x = Elan(256)(x) #50route3 = SPPCSPC(512)(x) #51x = YoloConv(256, 1, 1)(route3)x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)x = l.Concatenate(axis=1)([x, YoloConv(256, 1, 1)(route2)])route4 = Elan_A(256)(x) #63x = YoloConv(128, 1, 1)(route4)x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)x = l.Concatenate(axis=1)([x, YoloConv(128, 1, 1)(route1)])route5 = Elan_A(128)(x) #75, Connect to Detector 1x = MP(128)(route5) x = l.Concatenate(axis=1)([x, route4])route6 = Elan_A(256)(x) #88, Connect to Detector 2x = MP(256)(route6) x = l.Concatenate(axis=1)([x, route3])route7 = Elan_A(512)(x) #101, Connect to Detector 3detect1 = RepConv(256)(route5)detect2 = RepConv(512)(route6)detect3 = RepConv(1024)(route7)output1 = IDetect(256, 85, 3, 80)(detect1)output2 = IDetect(512, 85, 3, 40)(detect2)output3 = IDetect(1024, 85, 3, 20)(detect3)output = l.Concatenate(axis=-2)([output1, output2, output3])output = l.Activation('linear', dtype='float32')(output)model = keras.Model(inputs=inputs, outputs=output, name="yolov7_model")return model
损失函数的定义
YOLOv7对损失的定义可以见我另一篇文章的解读解读YOLO v7的代码(三)损失函数_gzroy的博客-CSDN博客
具体的定义在loss.py文件,我也是按照Yolov7的代码处理方式来进行tensorflow的改写,并且用了tf_function的封装来提高计算的效率, 代码如下:
import tensorflow as tf
import math
from test1 import batch_size, na, nl, img_size, stride, balance
from test1 import loss_box, loss_obj, loss_cls
from test1 import batch_no_constant, anchor_no_constant, anchors_reshape, anchor_t, anchors_constant, layer_no_constant
from test1 import val_batch_no_constant, val_layer_no_constant
from util import *
from params import *#In param:
# p - predictions of the model, list of three detection level.
# labels - the label of the object, dimension [batch, boxnum, 5(class, xywh)]
#Out param:
# results - list of the suggest positive samples for three detection level.
# dimension for each element: [sample_number, 5(batch_no, anch_no, x, y, class)]
# anch - list of the anchor wh ratio for the positive samples
# dimension for each element: [sample_number, anchor_w, anchor_h]
@tf.function(input_signature=([tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)])
)
def tf_find_3_positive(labels):batch_no = tf.zeros_like(labels)[...,0:1] + batch_no_constanttargets = tf.concat((batch_no, labels), axis=-1) #targets dim [batch,box_num,6]targets = tf.reshape(targets, [batch_size, 1, -1, 6]) #targets dim [batch,1,box_num,6]targets = tf.tile(targets, [1,na,1,1])anchor_no = anchor_no_constant + tf.reshape(tf.zeros_like(batch_no), [batch_size, 1, -1, 1])targets = tf.concat([targets,anchor_no], axis=-1) #targets dim [batch,na,box_num,7(batch_no, cls, xywh, anchor_no)]g = 0.5 # biasoffsets = tf.expand_dims(tf.constant([[0.,0.], [-1.,0.], [0.,-1.], [1.,0.], [0.,1.]]), axis=0) #offset dim [1,5,2]gain = tf.constant([[1.,1.,80.,80.,80.,80.,1.], [1.,1.,40.,40.,40.,40.,1.], [1.,1.,20.,20.,20.,20.,1.]])results = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)for i in tf.range(nl):t = targets * tf.gather(gain, i)r = t[..., 4:6] / tf.gather(anchors_reshape, i)r_reciprocal = tf.math.reciprocal_no_nan(r) #1/rr_max = tf.reduce_max(tf.math.maximum(r, r_reciprocal), axis=-1)mask_t = tf.logical_and(r_max<anchor_t, r_max>0)t = t[mask_t]# Offsetsgxy = t[:, 2:4] # grid xy#gxi = gain[[2, 3]] - gxy # inverse gxi = tf.gather(gain, i)[2:4] - gxymask_xy = tf.concat([tf.ones([tf.shape(t)[0], 1], dtype=tf.bool),((gxy % 1. < g) & (gxy > 1.)),((gxi % 1. < g) & (gxi > 1.))], axis=1)t = tf.repeat(tf.expand_dims(t, axis=1), 5, axis=1)[mask_xy]offsets_xy = (tf.expand_dims(tf.zeros_like(gxy, dtype=tf.float32), axis=1) + offsets)[mask_xy]xy = t[...,2:4] + offsets_xyfrom_which_layer = tf.ones_like(t[...,0:1]) * tf.dtypes.cast(i, tf.float32)results = results.write(i, tf.dtypes.cast(tf.concat([t[...,0:1], t[...,-1:], xy[...,1:2], xy[...,0:1], t[...,1:2], from_which_layer], axis=-1), tf.int32))anch = anch.write(i, tf.gather(tf.gather(anchors_constant, i), tf.dtypes.cast(t[...,-1], tf.int32)))return results.concat(), anch.concat()@tf.function(input_signature=([tf.TensorSpec(shape=[None, 4], dtype=tf.float32),tf.TensorSpec(shape=[None, 4], dtype=tf.float32)])
)
def box_iou(box1, box2):area1 = (box1[:,2]-box1[:,0])*(box1[:,3]-box1[:,1])area2 = (box2[:,2]-box2[:,0])*(box2[:,3]-box2[:,1])intersect_wh = tf.math.minimum(box1[:,None,2:], box2[:,2:]) - tf.math.maximum(box1[:,None,:2], box2[:,:2])intersect_wh = tf.clip_by_value(intersect_wh, clip_value_min=0, clip_value_max=img_size)intersect_area = intersect_wh[...,0]*intersect_wh[...,1]iou = intersect_area/(area1[:,None]+area2-intersect_area)return iou@tf.function(input_signature=([tf.TensorSpec(shape=[None, 4], dtype=tf.float32),tf.TensorSpec(shape=[None, 4], dtype=tf.float32)])
)
def bbox_ciou(box1, box2):eps=1e-7b1_x1, b1_x2 = box1[:,0]-box1[:,2]/2, box1[:,0]+box1[:,2]/2b1_y1, b1_y2 = box1[:,1]-box1[:,3]/2, box1[:,1]+box1[:,3]/2b2_x1, b2_x2 = box2[:,0]-box2[:,2]/2, box2[:,0]+box2[:,2]/2b2_y1, b2_y2 = box2[:,1]-box2[:,3]/2, box2[:,1]+box2[:,3]/2# Intersection areainter = tf.clip_by_value(tf.math.minimum(b1_x2, b2_x2) - tf.math.maximum(b1_x1, b2_x1), clip_value_min=0, clip_value_max=tf.float32.max) * tf.clip_by_value(tf.math.minimum(b1_y2, b2_y2) - tf.math.maximum(b1_y1, b2_y1), clip_value_min=0, clip_value_max=tf.float32.max)# Union Areaw1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + epsw2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + epsunion = w1 * h1 + w2 * h2 - inter + epsiou = inter / unioncw = tf.math.maximum(b1_x2, b2_x2) - tf.math.minimum(b1_x1, b2_x1) # convex (smallest enclosing box) widthch = tf.math.maximum(b1_y2, b2_y2) - tf.math.minimum(b1_y1, b2_y1) # convex heightc2 = cw 2 + ch 2 + eps # convex diagonal squaredrho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) 2 +(b2_y1 + b2_y2 - b1_y1 - b1_y2) 2) / 4 # center distance squaredv = (4 / math.pi 2) * tf.math.pow(tf.math.atan(w2 / (h2 + eps)) - tf.math.atan(w1 / (h1 + eps)), 2)alpha = v / (v - iou + (1 + eps))return iou - (rho2 / c2 + v * alpha)@tf.function(input_signature=([tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)])
)
def tf_build_targets(p, labels):results, anch = tf_find_3_positive(labels)#stride = tf.constant([8., 16., 32.])grids = tf.dtypes.cast(img_size/stride, tf.int32)pxyxys = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)p_obj = tf.TensorArray(tf.float32, size=nl, dynamic_size=True, element_shape=[None, 1])p_cls = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)all_idx = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)from_which_layer = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)all_anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)matching_idxs = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)matching_targets = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)matching_anchs = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)matching_layers = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)for i in tf.range(nl):idx_mask = results[...,-1]==iidx = tf.boolean_mask(results, idx_mask)layer_mask = layer_no_constant[...,0]==igrid_no = tf.gather(grids, i)pl = tf.boolean_mask(p, layer_mask)pl = tf.reshape(pl, [batch_size, na, grid_no, grid_no, -1])pi = tf.gather_nd(pl, idx[...,0:4])anchors_p = tf.boolean_mask(anch, idx_mask)p_obj = p_obj.write(i, pi[...,4:5])p_cls = p_cls.write(i, pi[...,5:])gij = tf.dtypes.cast(tf.concat([idx[...,3:4], idx[...,2:3]], axis=-1), tf.float32)pxy = (tf.math.sigmoid(pi[...,:2])*2-0.5+gij)*tf.dtypes.cast(tf.gather(stride, i), tf.float32)pwh = (tf.math.sigmoid(pi[...,2:4])*2)2*anchors_p*tf.dtypes.cast(tf.gather(stride, i), tf.float32)pxywh = tf.concat([pxy, pwh], axis=-1)pxyxy = xywh2xyxy(pxywh)pxyxys = pxyxys.write(i, pxyxy)all_idx = all_idx.write(i, idx[...,0:4])from_which_layer = from_which_layer.write(i, idx[..., -1:])all_anch = all_anch.write(i, tf.boolean_mask(anch, idx_mask))pxyxys = pxyxys.concat()p_obj = p_obj.concat()p_cls = p_cls.concat()all_idx = all_idx.concat()from_which_layer = from_which_layer.concat()all_anch = all_anch.concat()for i in tf.range(batch_size):batch_mask = all_idx[...,0]==iif tf.math.reduce_sum(tf.dtypes.cast(batch_mask, tf.int32)) > 0:pxyxy_i = tf.boolean_mask(pxyxys, batch_mask)target_mask = labels[i][...,3]>0target = tf.boolean_mask(labels[i], target_mask)txywh = target[...,1:] * img_sizetxyxy = xywh2xyxy(txywh)pair_wise_iou = box_iou(txyxy, pxyxy_i)pair_wise_iou_loss = -tf.math.log(pair_wise_iou + 1e-8)top_k, _ = tf.math.top_k(pair_wise_iou, tf.math.minimum(10, tf.shape(pair_wise_iou)[1]))dynamic_ks = tf.clip_by_value(tf.dtypes.cast(tf.math.reduce_sum(top_k, axis=-1), tf.int32),clip_value_min=1, clip_value_max=10)gt_cls_per_image = tf.tile(tf.expand_dims(tf.one_hot(tf.dtypes.cast(target[...,0], tf.int32), nc),axis = 1),[1,tf.shape(pxyxy_i)[0],1])num_gt = tf.shape(target)[0]cls_preds_ = (tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_cls, batch_mask), 0), [num_gt, 1, 1])) *tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_obj, batch_mask), 0), [num_gt, 1, 1]))) #dimension [labels_number, positive_targets_number, 80]y = tf.math.sqrt(cls_preds_)pair_wise_cls_loss = tf.math.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(labels = gt_cls_per_image,logits = tf.math.log(y/(1-y))),axis = -1)cost = (pair_wise_cls_loss+ 3.0 * pair_wise_iou_loss)matching_matrix = tf.zeros_like(cost) #dimension [labels_number, positive_targets_number]matching_idx = tf.TensorArray(tf.int64, size=0, dynamic_size=True)for gt_idx in tf.range(num_gt):_, pos_idx = tf.math.top_k(-cost[gt_idx], k=dynamic_ks[gt_idx], sorted=True)X,Y = tf.meshgrid(gt_idx, pos_idx)matching_idx = matching_idx.write(gt_idx, tf.dtypes.cast(tf.concat([X,Y], axis=-1), tf.int64))matching_idx = matching_idx.concat()'''matching_matrix = tf.scatter_nd(matching_idx, tf.ones(tf.shape(matching_idx)[0]), tf.dtypes.cast(tf.shape(cost), tf.int64))'''matching_matrix = tf.sparse.to_dense(tf.sparse.reorder(tf.sparse.SparseTensor(indices=tf.dtypes.cast(matching_idx, tf.int64), values=tf.ones(tf.shape(matching_idx)[0]), dense_shape=tf.dtypes.cast(tf.shape(cost), tf.int64))))anchor_matching_gt = tf.reduce_sum(matching_matrix, axis=0) #dimension [positive_targets_number]mask_1 = anchor_matching_gt>1 #it means one target match to several ground truthsif tf.reduce_sum(tf.dtypes.cast(mask_1, tf.int32)) > 0: #There is at least one positive target that predict several ground truth #Get the lowest cost of the serveral ground truth of the target#For example, there are 100 targets and 10 ground truths.#The #5 target match to the #2 and #3 ground truth, the related cost are 10 for #2 and 20 for #3#Then it will select #2 gound truth for the #5 target.#mask_1 dimension [positive_targets_number]#tf.boolean_mask(cost, mask_1, axis=1), dimension [ground_truth_numer, targets_predict_sevearl_GT_number]cost_argmin = tf.math.argmin(tf.boolean_mask(cost, mask_1, axis=1), axis=0) #in above example, the cost_argmin is [2]m = tf.dtypes.cast(mask_1, tf.float32)_, target_indices = tf.math.top_k(m, k=tf.dtypes.cast(tf.math.reduce_sum(m), tf.int32)) #in above example, the target_indices is [5]#So will set the index [2,5] of matching_matrix to 1, and set the other elements of [:,5] to 0target_matching_gt_indices = tf.concat([tf.reshape(tf.dtypes.cast(cost_argmin, tf.int32), [-1,1]), tf.reshape(target_indices, [-1,1])], axis=1) matching_matrix = tf.multiply(matching_matrix,tf.repeat(tf.reshape(tf.dtypes.cast(anchor_matching_gt<=1, tf.float32), [1,-1]), tf.shape(cost)[0], axis=0))target_value = tf.sparse.to_dense(tf.sparse.reorder(tf.sparse.SparseTensor(indices=tf.dtypes.cast(target_matching_gt_indices, tf.int64),values=tf.ones(tf.shape(target_matching_gt_indices)[0]),dense_shape=tf.dtypes.cast(tf.shape(matching_matrix), tf.int64))))matching_matrix = tf.add(matching_matrix, target_value)fg_mask_inboxes = tf.math.reduce_sum(matching_matrix, axis=0)>0. #The mask for the targets that will use to predictif tf.shape(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1))[0]>0:matched_gt_inds = tf.math.argmax(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1), axis=0) #Get the related gt number for the targetall_idx_i = tf.boolean_mask(tf.boolean_mask(all_idx, batch_mask), fg_mask_inboxes)from_which_layer_i = tf.boolean_mask(tf.boolean_mask(from_which_layer, batch_mask), fg_mask_inboxes)all_anch_i = tf.boolean_mask(tf.boolean_mask(all_anch, batch_mask), fg_mask_inboxes)matching_idxs = matching_idxs.write(i, all_idx_i)matching_layers = matching_layers.write(i, from_which_layer_i)matching_anchs = matching_anchs.write(i, all_anch_i )matching_targets = matching_targets.write(i, tf.gather(target, matched_gt_inds))else:matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32)) else:matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32))matching_idxs = matching_idxs.concat()matching_layers = matching_layers.concat()matching_anchs = matching_anchs.concat()matching_targets = matching_targets.concat()filter_mask = matching_idxs[:,0]!=-1matching_idxs = tf.boolean_mask(matching_idxs, filter_mask)matching_layers = tf.boolean_mask(matching_layers, filter_mask)matching_anchs = tf.boolean_mask(matching_anchs, filter_mask)matching_targets = tf.boolean_mask(matching_targets, filter_mask)#return pxyxys, all_idx, matching_idx, matching_matrix, all_idx_i, cost, pair_wise_iou, from_which_layer_ireturn matching_idxs, matching_layers, matching_anchs, matching_targets@tf.function(input_signature=([tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)])
)
def tf_loss_func(p, labels):matching_idxs, matching_layers, matching_anchs, matching_targets = tf_build_targets(p, labels)lcls, lbox, lobj = tf.zeros(1), tf.zeros(1), tf.zeros(1)grids = img_size//stridefor i in tf.range(nl):layer_mask = layer_no_constant[...,0]==igrid = tf.gather(grids, i)pi = tf.reshape(tf.boolean_mask(p, layer_mask), [batch_size, na, grid, grid, -1])matching_layer_mask = matching_layers[:,0]==iif tf.reduce_sum(tf.dtypes.cast(matching_layer_mask, tf.int32))==0:continuem_idxs = tf.boolean_mask(matching_idxs, matching_layer_mask)if tf.shape(m_idxs)[0]==0:continuem_targets = tf.boolean_mask(matching_targets, matching_layer_mask)m_anchs = tf.boolean_mask(matching_anchs, matching_layer_mask)ps = tf.gather_nd(pi, m_idxs)pxy = tf.math.sigmoid(ps[:,:2])*2-0.5pwh = (tf.math.sigmoid(ps[:,2:4])*2)2*m_anchspbox = tf.concat([pxy,pwh], axis=-1)#selected_tbox = tf.gather_nd(labels, matching_targets[i])[:, 1:]selected_tbox = m_targets[:, 1:]selected_tbox = tf.multiply(selected_tbox, tf.dtypes.cast(grid, tf.float32))tbox_grid = tf.concat([tf.dtypes.cast(m_idxs[:,3:4], tf.float32),tf.dtypes.cast(m_idxs[:,2:3], tf.float32),tf.zeros((tf.shape(m_idxs)[0],2))], axis=-1)selected_tbox = tf.subtract(selected_tbox, tbox_grid)iou = bbox_ciou(pbox, selected_tbox)lbox += tf.math.reduce_mean(1.0 - iou) # iou loss# Objectnesstobj = tf.sparse.to_dense(tf.sparse.reorder(tf.sparse.SparseTensor(indices = tf.dtypes.cast(m_idxs, tf.int64),values = (1.0 - gr) + gr * tf.clip_by_value(tf.stop_gradient(iou), clip_value_min=0, clip_value_max=tf.float32.max),dense_shape = tf.dtypes.cast(tf.shape(pi[..., 0]), tf.int64))), validate_indices=False)# Classificationtcls = tf.one_hot(indices = tf.dtypes.cast(m_targets[:,0], tf.int32),depth = 80,dtype = tf.float32)lcls += tf.math.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = tcls,logits = ps[:, 5:]))'''lcls += tf.math.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = tf.dtypes.cast(m_targets[:,0], tf.int32),logits = ps[:, 5:]) )'''obji = tf.math.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = tobj,logits = pi[..., 4]))lobj += obji * tf.gather(balance, i) lbox *= loss_boxlobj *= loss_objlcls *= loss_clsloss = (lbox + lobj + lcls) * batch_sizereturn loss@tf.function(input_signature=([tf.TensorSpec(shape=[None, na, 8400, 85], dtype=tf.float32),tf.TensorSpec(shape=[None, None, 5], dtype=tf.float32),tf.TensorSpec(shape=[None, 2], dtype=tf.int32),tf.TensorSpec(shape=[None], dtype=tf.int32),])
)
def tf_predict_func(predictions, labels, imgs_hw, imgs_id):grids = img_size // stridebatch_size = tf.shape(predictions)[0]confidence_threshold = 0.2probabilty_threshold = 0.8all_predict_result = tf.TensorArray(tf.float32, size=nl, dynamic_size=False) boxes_result = tf.TensorArray(tf.float32, size=0, dynamic_size=True) imgs_info = tf.TensorArray(tf.int32, size=0, dynamic_size=True)for i in tf.range(nl):grid = tf.gather(grids, i)grid_x, grid_y = tf.meshgrid(tf.range(grid, dtype=tf.float32), tf.range(grid, dtype=tf.float32))grid_x = tf.reshape(grid_x, [-1, 1])grid_y = tf.reshape(grid_y, [-1, 1])#grid_xy = tf.concat([grid_y, grid_x], axis=-1)grid_xy = tf.concat([grid_x, grid_y], axis=-1)grid_xy = tf.reshape(grid_xy, [1,1,-1,2])layer_mask = val_layer_no_constant[...,0]==i#grid = tf.gather(grids, i)predict_layer = tf.boolean_mask(predictions, layer_mask)predict_layer = tf.reshape(predict_layer, [batch_size, na, -1, 85])predict_conf = tf.math.sigmoid(predict_layer[...,4:5])predict_xy = (tf.math.sigmoid(predict_layer[...,:2])*2-0.5 + \\tf.dtypes.cast(grid_xy,tf.float32))*tf.dtypes.cast(tf.gather(stride, i), tf.float32)predict_wh = (tf.math.sigmoid(predict_layer[...,2:4])*2)2*\\tf.reshape(tf.gather(anchors_constant,i), [1,na,1,2])*\\tf.dtypes.cast(tf.gather(stride, i), tf.float32)predict_xywh = tf.concat([predict_xy, predict_wh], axis=-1)predict_xyxy = xywh2xyxy(predict_xywh)predict_cls = tf.reshape(tf.argmax(predict_layer[...,5:], axis=-1), [batch_size, na, -1, 1])predict_cls = tf.dtypes.cast(predict_cls, tf.float32)predict_proba = tf.nn.sigmoid(tf.reduce_max(predict_layer[...,5:], axis=-1, keepdims=True))batch_no = tf.expand_dims(tf.tile(tf.gather(val_batch_no_constant, tf.range(batch_size)), [1,na,grid*grid]), -1)predict_result = tf.concat([batch_no, predict_conf, predict_xyxy, predict_cls, predict_proba], axis=-1)mask = tf.math.logical_and(predict_result[...,1]>=confidence_threshold,predict_result[...,-1]>=probabilty_threshold)predict_result = tf.boolean_mask(predict_result, mask)#tf.print(tf.shape(predict_result))if tf.shape(predict_result)[0] > 0:all_predict_result = all_predict_result.write(i, predict_result)#tf.print(tf.shape(predict_result))else:all_predict_result = all_predict_result.write(i, tf.zeros(shape=[1,8]))all_predict_result = all_predict_result.concat()#return all_predict_resultfor i in tf.range(batch_size):batch_mask = tf.math.logical_and(all_predict_result[...,0]==tf.dtypes.cast(i, tf.float32),all_predict_result[...,1]>0)predict_true_box = tf.boolean_mask(all_predict_result, batch_mask)if tf.shape(predict_true_box)[0]==0:continueoriginal_hw = tf.dtypes.cast(tf.gather(imgs_hw, i), tf.float32)ratio = tf.dtypes.cast(tf.reduce_max(original_hw/img_size), tf.float32)predict_classes, _ = tf.unique(predict_true_box[:,6])#predict_classes_list = tf.unstack(predict_classes)#for class_id in predict_classes_list:for j in tf.range(tf.shape(predict_classes)[0]):#class_mask = tf.math.equal(predict_true_box[:, 6], class_id)class_mask = tf.math.equal(predict_true_box[:, 6], tf.gather(predict_classes, j))predict_true_box_class = tf.boolean_mask(predict_true_box, class_mask)predict_true_box_xy = predict_true_box_class[:, 2:6]predict_true_box_score = predict_true_box_class[:, 7]*predict_true_box_class[:, 1]#predict_true_box_score = predict_true_box_class[:, 1]selected_indices = tf.image.non_max_suppression(predict_true_box_xy,predict_true_box_score,100,iou_threshold=0.2#score_threshold=confidence_threshold)#Shape [box_num, 7]selected_boxes = tf.gather(predict_true_box_class, selected_indices) #boxes_result = boxes_result.write(boxes_result.size(), selected_boxes)boxes_xyxy = selected_boxes[:,2:6]*ratioboxes_x1 = tf.clip_by_value(boxes_xyxy[:,0:1], 0., original_hw[1])boxes_x2 = tf.clip_by_value(boxes_xyxy[:,2:3], 0., original_hw[1])boxes_y1 = tf.clip_by_value(boxes_xyxy[:,1:2], 0., original_hw[0])boxes_y2 = tf.clip_by_value(boxes_xyxy[:,3:4], 0., original_hw[0])boxes_w = boxes_x2 - boxes_x1boxes_h = boxes_y2 - boxes_y1boxes = tf.concat([selected_boxes[:,0:2], boxes_x1, boxes_y1, boxes_w, boxes_h, selected_boxes[:,6:8]], axis=-1)boxes_result = boxes_result.write(boxes_result.size(), boxes)img_id = tf.gather(imgs_id, i)imgs_info = imgs_info.write(imgs_info.size(), tf.reshape(tf.stack([i, img_id]), [-1,2]))if boxes_result.size()==0:boxes_result = boxes_result.write(0, tf.zeros(shape=[1,8]))if imgs_info.size()==0:imgs_info = imgs_info.write(0, tf.dtypes.cast(tf.zeros(shape=[1,2]), tf.int32))return boxes_result.concat(), imgs_info.concat()
训练与验证
最后就是对模型进行训练和验证了,这里也是按照YOLOv7的实现方式来进行训练,验证的时候是采用pycocotools工具来进行mAP的计算。具体可以参见train.py文件
因为模型是对640*640大小的图像进行训练,对GPU的显存要求很大。在我本地的2080Ti显卡,11G内存的情况下,开启混合精度,只能设置Batch size为8,训练效果不是很理想。为此我在autodl平台租用了一个V100的32G显存的GPU来进行测试(价格是每小时2.28元),Batch size设置为32。感觉Batch size对模型的训练效果还是有比较大的影响的。最终经过了20多个epoch的训练,每个Epoch大概要训练1个小时多一点,大概花费了1天的时间,结果如下:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.270Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.411Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.289Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.162Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.302Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.334Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.268Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.476Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.528Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.338Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.661
以下是对验证集的一些图片的预测结果,
按照Yolov7论文的描述,训练了300个epoch之后,mAP all能达到60%,继续训练可以进一步提高准确率,不过限于时间和资源,我就暂时训练到这个地步。
最后,我的源码都放在了Github的仓库,GitHub - gzroy/yolov7_tf2: Yolov7 implementation on tensorflow 2.x