2025年8月

物体检测

在图像中对多个物体进行分类和定位的任务称为物体检测。一种通用的方法是采用经过训练的CNN来对单个物体进行分类和定位,然后将其在图像上滑动。

这项技术非常简单直观,但是它将多次检测同一物体,但位置略有不同。然后需要进行一些后期处理,以消除所有不必要的边界框。一种常见的方法称为非极大抑制。以下是操作方式:

  1. 首先需要在CNN中添加一个额外的客观分数(置信度)输出,以估计图像中确实存在花朵的可能性(或者可以添加“无花朵”类,但这通常不起好的作用)。它必须使用sigmoid激活函数,而且可以使用二元交叉熵损失对其进行训练。然后删除所有置信度得分低于某个阈值的边界框:这将删除所有实际上不包含花的边界框。
  2. 找到具有最大客观分数的边界框,并删除与其重叠很多的所有其他边界框(例如IoU大于60%)
  3. 重复第二步,直到没有更多的边界框可以删除

这种简单的物体检测方法效果很好,但是它需要多次运行CNN,因此速度很慢。幸运的是,有一种更快的方法可以在图像上滑动CNN:使用全卷积网络(FCN)。

分类与定位

定位图片中物体可以表示为回归任务:预测物体周围的边界框,一种常见的方法是预测物体中心的水平坐标和垂直坐标,还有其高度和宽度。这意味着有四个数字要预测。它不需要对模型进行太多修改,只需要添加四个具有单位的第二个密集输出层(通常在全局平均池化层之上),就可以使用MSE损失对其进行训练:

import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds

(test_set, valid_set, train_set), info = tfds.load('tf_flowers', split=['train[:10%]', 'train[10%:25%]', 'train[25%:]'],
                                                   as_supervised=True, with_info=True)
dataset_size = info.splits['train'].num_examples
class_names = info.features['label'].names
n_classes = info.features['label'].num_classes


def preprocess(image, label):
    resize_image = tf.image.resize(image, [224, 224])
    final_image = keras.applications.xception.preprocess_input(resize_image)
    return final_image, label


batch_size = 16
train_set = train_set.shuffle(1000)
train_set = train_set.map(preprocess).batch(batch_size).prefetch(1)
valid_set = valid_set.map(preprocess).batch(batch_size).prefetch(1)
test_set = test_set.map(preprocess).batch(batch_size).prefetch(1)
base_model = keras.applications.xception.Xception(weights='imagenet', include_top=False)
avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
class_output = keras.layers.Dense(n_classes, activation='softmax')(avg)
loc_output = keras.layers.Dense(4)(avg)
optimizer = keras.optimizers.SGD(lr=.2, momentum=.9, decay=.01)
model = keras.Model(input=base_model.input, outputs=[class_output, loc_output])
model.compile(loss=['sparse_categorical_entropy', 'mse'], loss_weights=[0.8, 0.2],
              optimizer=optimizer, metrics=['accuracy'])

花朵数据集在花朵周围没有边界框,因此需要自己添加。这通常是机器学习项目中最难、最昂贵的部分之一:获取标签。花时间寻找合适的工具是一个办法。要使用边界框标注图像,可能需要使用开源图像标记工具,例如VGG Image Annotator、LabelImg、OpenLabeler或ImageLab,或者使用商业工具(例如LabelBox或Supervisely)。如果需要标注大量图像,则可能还需要考虑众包平台,例如Amazon Mechanical Turk。但是建立众包平台,需要准备发送给工人的表格,对其进行监督并确保他们产生的边界框的质量是很好的,因此要保证这样做是值得的。如果只有几千张图像要标记,最好自己动手做。

如果已经获得了花朵数据集中每个图像的边界框(假设每个图像有一个边界框)。那么需要创建一个数据集,其数据项将是经过预处理的图像的批量处理,以及它们的类标签和边界框。每个数据集都应为一下形式的元组:(image,(class_labels,bounding_boxes))。

对边界框应该进行归一化,以便水平坐标和垂直坐标以及高度和宽度都在0到1的范围内。而且通常要预测高度和宽度的平方根,而不是直接的高度和宽度值:通过这种方式,对于大边框的10像素错误将不会像对小边框的10像素错误一样受惩罚

MSE通常作为成本函数可以很好的训练模型,但是评估模型对边界框的预测能力不是一个很好的指标。最常用的度量指标是“交并比”(Intersection over Union IoU):预测边界框和目标边界框之间的重叠面积除以它们的联合面积。在tf.keras中,它是由tf.keras.metrics.MeanIoU类实现的。

迁移学习的预训练模型

如果想构建图像分类器但没有足够的训练数据,那么重用预训练模型的较低层通常是个好办法。例如,训练模型来对花的图片进行分类,并使用预先训练的Xception模型。首先,使用TensorFlow Datasets加载数据集:

import tensorflow_datasets as tfds

(test_set, valid_set, train_set), info = tfds.load('tf_flowers', split=['train[:10%]', 'train[10%:25%]', 'train[25%:]'],
                                                   as_supervised=True, with_info=True)
dataset_size = info.splits['train'].num_examples
class_names = info.features['label'].names
n_classes = info.features['label'].num_classes

可以通过设置with_info=True获取有关数据集的信息。在这里获得数据集的大小和类的名称。只有一个train数据集,没有测试集或验证集,因此需要拆分训练集。可以在load方法中添加split参数。例如,10%用于测试,15%用于验证,75%用于训练:
现在必须处理图像。CNN需要$224\times224$大小的图像,因此需要调整它们的大小。还需要通过Xception的preprocess_input()函数来预处理图像:

import tensorflow as tf
from tensorflow import keras


def preprocess(image, label):
    resize_image = tf.image.resize(image, [224, 224])
    final_image = keras.applications.xception.preprocess_input(resize_image)
    return final_image, label

用这个预处理函数来处理所有三个数据集,对训练集进行乱序,并对所有数据集添加批量处理和预取:

batch_size = 16
train_set = train_set.shuffle(1000)
train_set = train_set.map(preprocess).batch(batch_size).prefetch(1)
valid_set = valid_set.map(preprocess).batch(batch_size).prefetch(1)
test_set = test_set.map(preprocess).batch(batch_size).prefetch(1)

如果要执行一些数据增强,可以更改训练集的预处理功能,向训练集图像添加一些随机变换。例如,使用tf.image.random_crop()随机裁剪图像,使用tf.image.random_filp_left_right()随机水平翻转图像

接下来加载一个在ImageNet上预训练的Xception模型。通过设置include_top=False排除网络的顶部:这排除了全局池化层和密集输出层。然后根据基本模型的输出,添加自己的全局平均池化层,再跟每一个类一个单位的密集输出层使用softmax函数。最后创建Keras模型

base_model = keras.applications.xception.Xception(weights='imagenet',
                                                  include_top=False)
avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
output = keras.layers.Dense(n_classes, activation='softmax')(avg)
model = keras.Model(inputs=base_model.input, outputs=output)
# 在预训练开始时冻结预训练层的权重通常是一个好主意:避免破坏预训练层的权重
for layer in base_model.layers:
    layer.trainable = False
# 由于创建的模型直接使用基本模型的层,而不是使用base_model对象本身,因此设置base_model.trainable=False无效
#最后编译模型开始训练:
optimizer = keras.optimizers.SGD(lr=.2, momentum=.9, decay=.01)
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history = model.fit(train_set, epochs=5, validation_data=valid_set)
Epoch 5/5
172/172 [==============================] - 35s 205ms/step - loss: 0.0274 - accuracy: 0.9920 - val_loss: 0.3030 - val_accuracy: 0.9111


model.summary()
Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_4 (InputLayer)            [(None, None, None,  0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, None, None, 3 864         input_4[0][0]                    
__________________________________________________________________________________________________
block1_conv1_bn (BatchNormaliza (None, None, None, 3 128         block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_conv1_act (Activation)   (None, None, None, 3 0           block1_conv1_bn[0][0]            
__________________________________________________________________________________________________
block1_conv2 (Conv2D)           (None, None, None, 6 18432       block1_conv1_act[0][0]           
__________________________________________________________________________________________________
block1_conv2_bn (BatchNormaliza (None, None, None, 6 256         block1_conv2[0][0]               
__________________________________________________________________________________________________
block1_conv2_act (Activation)   (None, None, None, 6 0           block1_conv2_bn[0][0]            
__________________________________________________________________________________________________
block2_sepconv1 (SeparableConv2 (None, None, None, 1 8768        block1_conv2_act[0][0]           
__________________________________________________________________________________________________
block2_sepconv1_bn (BatchNormal (None, None, None, 1 512         block2_sepconv1[0][0]            
__________________________________________________________________________________________________
block2_sepconv2_act (Activation (None, None, None, 1 0           block2_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block2_sepconv2 (SeparableConv2 (None, None, None, 1 17536       block2_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block2_sepconv2_bn (BatchNormal (None, None, None, 1 512         block2_sepconv2[0][0]            
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, None, None, 1 8192        block1_conv2_act[0][0]           
__________________________________________________________________________________________________
block2_pool (MaxPooling2D)      (None, None, None, 1 0           block2_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, None, None, 1 512         conv2d_12[0][0]                  
__________________________________________________________________________________________________
add_36 (Add)                    (None, None, None, 1 0           block2_pool[0][0]                
                                                                 batch_normalization_12[0][0]     
__________________________________________________________________________________________________
block3_sepconv1_act (Activation (None, None, None, 1 0           add_36[0][0]                     
__________________________________________________________________________________________________
block3_sepconv1 (SeparableConv2 (None, None, None, 2 33920       block3_sepconv1_act[0][0]        
__________________________________________________________________________________________________
block3_sepconv1_bn (BatchNormal (None, None, None, 2 1024        block3_sepconv1[0][0]            
__________________________________________________________________________________________________
block3_sepconv2_act (Activation (None, None, None, 2 0           block3_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block3_sepconv2 (SeparableConv2 (None, None, None, 2 67840       block3_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block3_sepconv2_bn (BatchNormal (None, None, None, 2 1024        block3_sepconv2[0][0]            
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, None, None, 2 32768       add_36[0][0]                     
__________________________________________________________________________________________________
block3_pool (MaxPooling2D)      (None, None, None, 2 0           block3_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, None, None, 2 1024        conv2d_13[0][0]                  
__________________________________________________________________________________________________
add_37 (Add)                    (None, None, None, 2 0           block3_pool[0][0]                
                                                                 batch_normalization_13[0][0]     
__________________________________________________________________________________________________
block4_sepconv1_act (Activation (None, None, None, 2 0           add_37[0][0]                     
__________________________________________________________________________________________________
block4_sepconv1 (SeparableConv2 (None, None, None, 7 188672      block4_sepconv1_act[0][0]        
__________________________________________________________________________________________________
block4_sepconv1_bn (BatchNormal (None, None, None, 7 2912        block4_sepconv1[0][0]            
__________________________________________________________________________________________________
block4_sepconv2_act (Activation (None, None, None, 7 0           block4_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block4_sepconv2 (SeparableConv2 (None, None, None, 7 536536      block4_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block4_sepconv2_bn (BatchNormal (None, None, None, 7 2912        block4_sepconv2[0][0]            
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, None, None, 7 186368      add_37[0][0]                     
__________________________________________________________________________________________________
block4_pool (MaxPooling2D)      (None, None, None, 7 0           block4_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, None, None, 7 2912        conv2d_14[0][0]                  
__________________________________________________________________________________________________
add_38 (Add)                    (None, None, None, 7 0           block4_pool[0][0]                
                                                                 batch_normalization_14[0][0]     
__________________________________________________________________________________________________
block5_sepconv1_act (Activation (None, None, None, 7 0           add_38[0][0]                     
__________________________________________________________________________________________________
block5_sepconv1 (SeparableConv2 (None, None, None, 7 536536      block5_sepconv1_act[0][0]        
__________________________________________________________________________________________________
block5_sepconv1_bn (BatchNormal (None, None, None, 7 2912        block5_sepconv1[0][0]            
__________________________________________________________________________________________________
block5_sepconv2_act (Activation (None, None, None, 7 0           block5_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block5_sepconv2 (SeparableConv2 (None, None, None, 7 536536      block5_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block5_sepconv2_bn (BatchNormal (None, None, None, 7 2912        block5_sepconv2[0][0]            
__________________________________________________________________________________________________
block5_sepconv3_act (Activation (None, None, None, 7 0           block5_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
block5_sepconv3 (SeparableConv2 (None, None, None, 7 536536      block5_sepconv3_act[0][0]        
__________________________________________________________________________________________________
block5_sepconv3_bn (BatchNormal (None, None, None, 7 2912        block5_sepconv3[0][0]            
__________________________________________________________________________________________________
add_39 (Add)                    (None, None, None, 7 0           block5_sepconv3_bn[0][0]         
                                                                 add_38[0][0]                     
__________________________________________________________________________________________________
block6_sepconv1_act (Activation (None, None, None, 7 0           add_39[0][0]                     
__________________________________________________________________________________________________
block6_sepconv1 (SeparableConv2 (None, None, None, 7 536536      block6_sepconv1_act[0][0]        
__________________________________________________________________________________________________
block6_sepconv1_bn (BatchNormal (None, None, None, 7 2912        block6_sepconv1[0][0]            
__________________________________________________________________________________________________
block6_sepconv2_act (Activation (None, None, None, 7 0           block6_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block6_sepconv2 (SeparableConv2 (None, None, None, 7 536536      block6_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block6_sepconv2_bn (BatchNormal (None, None, None, 7 2912        block6_sepconv2[0][0]            
__________________________________________________________________________________________________
block6_sepconv3_act (Activation (None, None, None, 7 0           block6_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
block6_sepconv3 (SeparableConv2 (None, None, None, 7 536536      block6_sepconv3_act[0][0]        
__________________________________________________________________________________________________
block6_sepconv3_bn (BatchNormal (None, None, None, 7 2912        block6_sepconv3[0][0]            
__________________________________________________________________________________________________
add_40 (Add)                    (None, None, None, 7 0           block6_sepconv3_bn[0][0]         
                                                                 add_39[0][0]                     
__________________________________________________________________________________________________
block7_sepconv1_act (Activation (None, None, None, 7 0           add_40[0][0]                     
__________________________________________________________________________________________________
block7_sepconv1 (SeparableConv2 (None, None, None, 7 536536      block7_sepconv1_act[0][0]        
__________________________________________________________________________________________________
block7_sepconv1_bn (BatchNormal (None, None, None, 7 2912        block7_sepconv1[0][0]            
__________________________________________________________________________________________________
block7_sepconv2_act (Activation (None, None, None, 7 0           block7_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block7_sepconv2 (SeparableConv2 (None, None, None, 7 536536      block7_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block7_sepconv2_bn (BatchNormal (None, None, None, 7 2912        block7_sepconv2[0][0]            
__________________________________________________________________________________________________
block7_sepconv3_act (Activation (None, None, None, 7 0           block7_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
block7_sepconv3 (SeparableConv2 (None, None, None, 7 536536      block7_sepconv3_act[0][0]        
__________________________________________________________________________________________________
block7_sepconv3_bn (BatchNormal (None, None, None, 7 2912        block7_sepconv3[0][0]            
__________________________________________________________________________________________________
add_41 (Add)                    (None, None, None, 7 0           block7_sepconv3_bn[0][0]         
                                                                 add_40[0][0]                     
__________________________________________________________________________________________________
block8_sepconv1_act (Activation (None, None, None, 7 0           add_41[0][0]                     
__________________________________________________________________________________________________
block8_sepconv1 (SeparableConv2 (None, None, None, 7 536536      block8_sepconv1_act[0][0]        
__________________________________________________________________________________________________
block8_sepconv1_bn (BatchNormal (None, None, None, 7 2912        block8_sepconv1[0][0]            
__________________________________________________________________________________________________
block8_sepconv2_act (Activation (None, None, None, 7 0           block8_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block8_sepconv2 (SeparableConv2 (None, None, None, 7 536536      block8_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block8_sepconv2_bn (BatchNormal (None, None, None, 7 2912        block8_sepconv2[0][0]            
__________________________________________________________________________________________________
block8_sepconv3_act (Activation (None, None, None, 7 0           block8_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
block8_sepconv3 (SeparableConv2 (None, None, None, 7 536536      block8_sepconv3_act[0][0]        
__________________________________________________________________________________________________
block8_sepconv3_bn (BatchNormal (None, None, None, 7 2912        block8_sepconv3[0][0]            
__________________________________________________________________________________________________
add_42 (Add)                    (None, None, None, 7 0           block8_sepconv3_bn[0][0]         
                                                                 add_41[0][0]                     
__________________________________________________________________________________________________
block9_sepconv1_act (Activation (None, None, None, 7 0           add_42[0][0]                     
__________________________________________________________________________________________________
block9_sepconv1 (SeparableConv2 (None, None, None, 7 536536      block9_sepconv1_act[0][0]        
__________________________________________________________________________________________________
block9_sepconv1_bn (BatchNormal (None, None, None, 7 2912        block9_sepconv1[0][0]            
__________________________________________________________________________________________________
block9_sepconv2_act (Activation (None, None, None, 7 0           block9_sepconv1_bn[0][0]         
__________________________________________________________________________________________________
block9_sepconv2 (SeparableConv2 (None, None, None, 7 536536      block9_sepconv2_act[0][0]        
__________________________________________________________________________________________________
block9_sepconv2_bn (BatchNormal (None, None, None, 7 2912        block9_sepconv2[0][0]            
__________________________________________________________________________________________________
block9_sepconv3_act (Activation (None, None, None, 7 0           block9_sepconv2_bn[0][0]         
__________________________________________________________________________________________________
block9_sepconv3 (SeparableConv2 (None, None, None, 7 536536      block9_sepconv3_act[0][0]        
__________________________________________________________________________________________________
block9_sepconv3_bn (BatchNormal (None, None, None, 7 2912        block9_sepconv3[0][0]            
__________________________________________________________________________________________________
add_43 (Add)                    (None, None, None, 7 0           block9_sepconv3_bn[0][0]         
                                                                 add_42[0][0]                     
__________________________________________________________________________________________________
block10_sepconv1_act (Activatio (None, None, None, 7 0           add_43[0][0]                     
__________________________________________________________________________________________________
block10_sepconv1 (SeparableConv (None, None, None, 7 536536      block10_sepconv1_act[0][0]       
__________________________________________________________________________________________________
block10_sepconv1_bn (BatchNorma (None, None, None, 7 2912        block10_sepconv1[0][0]           
__________________________________________________________________________________________________
block10_sepconv2_act (Activatio (None, None, None, 7 0           block10_sepconv1_bn[0][0]        
__________________________________________________________________________________________________
block10_sepconv2 (SeparableConv (None, None, None, 7 536536      block10_sepconv2_act[0][0]       
__________________________________________________________________________________________________
block10_sepconv2_bn (BatchNorma (None, None, None, 7 2912        block10_sepconv2[0][0]           
__________________________________________________________________________________________________
block10_sepconv3_act (Activatio (None, None, None, 7 0           block10_sepconv2_bn[0][0]        
__________________________________________________________________________________________________
block10_sepconv3 (SeparableConv (None, None, None, 7 536536      block10_sepconv3_act[0][0]       
__________________________________________________________________________________________________
block10_sepconv3_bn (BatchNorma (None, None, None, 7 2912        block10_sepconv3[0][0]           
__________________________________________________________________________________________________
add_44 (Add)                    (None, None, None, 7 0           block10_sepconv3_bn[0][0]        
                                                                 add_43[0][0]                     
__________________________________________________________________________________________________
block11_sepconv1_act (Activatio (None, None, None, 7 0           add_44[0][0]                     
__________________________________________________________________________________________________
block11_sepconv1 (SeparableConv (None, None, None, 7 536536      block11_sepconv1_act[0][0]       
__________________________________________________________________________________________________
block11_sepconv1_bn (BatchNorma (None, None, None, 7 2912        block11_sepconv1[0][0]           
__________________________________________________________________________________________________
block11_sepconv2_act (Activatio (None, None, None, 7 0           block11_sepconv1_bn[0][0]        
__________________________________________________________________________________________________
block11_sepconv2 (SeparableConv (None, None, None, 7 536536      block11_sepconv2_act[0][0]       
__________________________________________________________________________________________________
block11_sepconv2_bn (BatchNorma (None, None, None, 7 2912        block11_sepconv2[0][0]           
__________________________________________________________________________________________________
block11_sepconv3_act (Activatio (None, None, None, 7 0           block11_sepconv2_bn[0][0]        
__________________________________________________________________________________________________
block11_sepconv3 (SeparableConv (None, None, None, 7 536536      block11_sepconv3_act[0][0]       
__________________________________________________________________________________________________
block11_sepconv3_bn (BatchNorma (None, None, None, 7 2912        block11_sepconv3[0][0]           
__________________________________________________________________________________________________
add_45 (Add)                    (None, None, None, 7 0           block11_sepconv3_bn[0][0]        
                                                                 add_44[0][0]                     
__________________________________________________________________________________________________
block12_sepconv1_act (Activatio (None, None, None, 7 0           add_45[0][0]                     
__________________________________________________________________________________________________
block12_sepconv1 (SeparableConv (None, None, None, 7 536536      block12_sepconv1_act[0][0]       
__________________________________________________________________________________________________
block12_sepconv1_bn (BatchNorma (None, None, None, 7 2912        block12_sepconv1[0][0]           
__________________________________________________________________________________________________
block12_sepconv2_act (Activatio (None, None, None, 7 0           block12_sepconv1_bn[0][0]        
__________________________________________________________________________________________________
block12_sepconv2 (SeparableConv (None, None, None, 7 536536      block12_sepconv2_act[0][0]       
__________________________________________________________________________________________________
block12_sepconv2_bn (BatchNorma (None, None, None, 7 2912        block12_sepconv2[0][0]           
__________________________________________________________________________________________________
block12_sepconv3_act (Activatio (None, None, None, 7 0           block12_sepconv2_bn[0][0]        
__________________________________________________________________________________________________
block12_sepconv3 (SeparableConv (None, None, None, 7 536536      block12_sepconv3_act[0][0]       
__________________________________________________________________________________________________
block12_sepconv3_bn (BatchNorma (None, None, None, 7 2912        block12_sepconv3[0][0]           
__________________________________________________________________________________________________
add_46 (Add)                    (None, None, None, 7 0           block12_sepconv3_bn[0][0]        
                                                                 add_45[0][0]                     
__________________________________________________________________________________________________
block13_sepconv1_act (Activatio (None, None, None, 7 0           add_46[0][0]                     
__________________________________________________________________________________________________
block13_sepconv1 (SeparableConv (None, None, None, 7 536536      block13_sepconv1_act[0][0]       
__________________________________________________________________________________________________
block13_sepconv1_bn (BatchNorma (None, None, None, 7 2912        block13_sepconv1[0][0]           
__________________________________________________________________________________________________
block13_sepconv2_act (Activatio (None, None, None, 7 0           block13_sepconv1_bn[0][0]        
__________________________________________________________________________________________________
block13_sepconv2 (SeparableConv (None, None, None, 1 752024      block13_sepconv2_act[0][0]       
__________________________________________________________________________________________________
block13_sepconv2_bn (BatchNorma (None, None, None, 1 4096        block13_sepconv2[0][0]           
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, None, None, 1 745472      add_46[0][0]                     
__________________________________________________________________________________________________
block13_pool (MaxPooling2D)     (None, None, None, 1 0           block13_sepconv2_bn[0][0]        
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, None, None, 1 4096        conv2d_15[0][0]                  
__________________________________________________________________________________________________
add_47 (Add)                    (None, None, None, 1 0           block13_pool[0][0]               
                                                                 batch_normalization_15[0][0]     
__________________________________________________________________________________________________
block14_sepconv1 (SeparableConv (None, None, None, 1 1582080     add_47[0][0]                     
__________________________________________________________________________________________________
block14_sepconv1_bn (BatchNorma (None, None, None, 1 6144        block14_sepconv1[0][0]           
__________________________________________________________________________________________________
block14_sepconv1_act (Activatio (None, None, None, 1 0           block14_sepconv1_bn[0][0]        
__________________________________________________________________________________________________
block14_sepconv2 (SeparableConv (None, None, None, 2 3159552     block14_sepconv1_act[0][0]       
__________________________________________________________________________________________________
block14_sepconv2_bn (BatchNorma (None, None, None, 2 8192        block14_sepconv2[0][0]           
__________________________________________________________________________________________________
block14_sepconv2_act (Activatio (None, None, None, 2 0           block14_sepconv2_bn[0][0]        
__________________________________________________________________________________________________
global_average_pooling2d_3 (Glo (None, 2048)         0           block14_sepconv2_act[0][0]       
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 5)            10245       global_average_pooling2d_3[0][0] 
==================================================================================================
Total params: 20,871,725
Trainable params: 20,817,197
Non-trainable params: 54,528
__________________________________________________________________________________________________

使用Keras的预训练模型

通常,无需手动实现像GoogLeNet或ResNet这样的标准模型,因为在keras.applications包中只需一行代码即可获得预训练的网络。

例如,使用以下代码加载在ImageNet上预训练的ResNet-50模型:

from tensorflow import keras
model=keras.applications.resnet50.ResNet50(weights='imagenet')

这会创建一个ResNet-50模型并下载在ImageNet数据集上预训练的权重。要使用它,首先需要确保图像尺寸合适。ResNet-50模型需要$224\times224$像素的图像(其他模型可能需要其他尺寸,例如$299\times229$)。可以使用tf.image.resize()来调整之前加载的图像的大小:

import tensorflow as tf
image_resized=tf.image.resize(images,[224,224])

tf.image.resize()不会保留宽高比。如果这是一个问题,需要在调整大小之前尝试将图像裁剪为适当的宽高比。两种操作可以使用tf.image.crop_and_resize()一次完成

预训练的模型假定以特定方式对图像进行预处理。在某些情况下,它们可能期待输入缩放到0到1,或从-1到1等等。每个模型都提供一个preprocess_input()函数,可以用来预处理图像。这些函数假定像素值的范围是0到255,因此必须将它们乘以255(因为之前将其缩放到0-1范围内):

inputs=keras.applications.resnet50.preprocess_input(images_resize*255)

现在可以使用预训练模型进行预测:

Y_proba=model.predict(inputs)

通常Y_proba输出一个矩阵,每个图像一行,每个类一列(在此例中,共有1000个类),如果要显示前K个预测(包括类名和每个预测类的估计概率),需要使用encode_predictions()函数。对于每个图像,它返回包含前K个预测的数组,其中每个预测都表示为一个包含标识符、其名称和对应置信度得分的数组:

top_K=keras.applications.resnet50.decode_predictions(Y_proba,top=3)
for image_index in range(len(images)):
    print('Image #{}'.format(image_index))
    for class_id,name,y_proba in top_K[image_index]:
        print('    {} - {:12s} {:.2f}%'.format(class_id,name,y_proba*100))
    print()

使用预先训练好的模型来创建一个好的图像分类器非常容易。Keras.applications中还提供了其他的视觉模型,包括多个ResNet变体、GoogLeNet变体、VGGNet变体以及MobileNet和MobileNetV2(用于移动应用的轻量级模型)

但是,如果要将图像分类器用于不属于ImageNet的图像类,在这种情况下,可以从预先训练的模型中受益,以进行迁移学习

CNN架构

典型的CNN架构堆叠了一些卷积层(通常每个卷积层都跟随着一个ReLU层),然后是一个池化层,然后是另外几个卷积层(+ReLU),然后是另一个池化层,以此类推。随着卷积网络的不断发展,图像变得越来越小,但是由于卷积层的存在,图像通常也越来越深(即具有更多的特征图)。在堆栈的顶部添加了一个常规的前馈神经网络,该网络由几个全连接层(+ReLU)组成,最后一层输出预测(例如输出估计类概率的softmax层)

一个常见的错误是使用太大的卷积内核。例如与其使用具有$5\times5$内核卷积层,不如使用具有$3\times3$内核的两层:它使用较少的参数并需要较少的计算,并且通常性能会更好。第一个卷积层是例外:它具有较大的内核,步幅通常为2或更大,这将减少图像的空间维度而不会丢失太多信息,由于输出图像通常只有三个通道,因此不需要太多的计算量。

这是实现简单的CNN来处理Fashion MNIST数据集的方法:

import tensorflow as tf
from tensorflow import keras
fashion_mnist = keras.datasets.fashion_mnist

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images[...,tf.newaxis] / 255.0

test_images = test_images[...,tf.newaxis] / 255.0
model = keras.models.Sequential([
    keras.layers.Conv2D(64, 7, activation='relu',
                        padding='same', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D(2),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    keras.layers.Conv2D(256, 3, activation='relu', padding='same'),
    keras.layers.Conv2D(265, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(.5),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(.5),
    keras.layers.Dense(10, activation='softmax')
])
C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\numpy\.libs\libopenblas.GK7GX5KEQ4F6UYO3P26ULGBQYHGQO7J4.gfortran-win_amd64.dll
C:\ProgramData\Miniconda3\envs\tf2\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll
  warnings.warn("loaded more than 1 DLL from .libs:"


model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 64)        3200      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 128)       73856     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 256)         295168    
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 7, 7, 265)         610825    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 3, 3, 265)         0         
_________________________________________________________________
flatten (Flatten)            (None, 2385)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               305408    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
=================================================================
Total params: 1,297,363
Trainable params: 1,297,363
Non-trainable params: 0
_________________________________________________________________


model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10,
                    validation_data=(test_images, test_labels), batch_size=32)
Epoch 1/10
1875/1875 [==============================] - 13s 5ms/step - loss: 0.7189 - accuracy: 0.7408 - val_loss: 0.3981 - val_accuracy: 0.8490
Epoch 2/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.4162 - accuracy: 0.8581 - val_loss: 0.3345 - val_accuracy: 0.8808
Epoch 3/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.3519 - accuracy: 0.8819 - val_loss: 0.2957 - val_accuracy: 0.8928
Epoch 4/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.3161 - accuracy: 0.8937 - val_loss: 0.3076 - val_accuracy: 0.8939
Epoch 5/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2877 - accuracy: 0.9029 - val_loss: 0.2882 - val_accuracy: 0.8986
Epoch 6/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2702 - accuracy: 0.9091 - val_loss: 0.2671 - val_accuracy: 0.9047
Epoch 7/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2505 - accuracy: 0.9147 - val_loss: 0.2730 - val_accuracy: 0.9053
Epoch 8/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2381 - accuracy: 0.9185 - val_loss: 0.2787 - val_accuracy: 0.9049
Epoch 9/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2241 - accuracy: 0.9245 - val_loss: 0.2976 - val_accuracy: 0.9036
Epoch 10/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2116 - accuracy: 0.9259 - val_loss: 0.2853 - val_accuracy: 0.9110

  • 第一层使用64个相当大的滤波器($7\times7$),但没有步幅,因为输入图像不是很大。设置了input_size=(28,28,1),因为图像是$28\times28$,具有单个颜色通道(即灰度)
  • 有一个最大池化层,池化大小为2,因此它将每个空间维度除以2
  • 重复相同的结构两次:两个卷积层,紧接着是一个最大池化层。对于较大的图像,可以重复多次此结构
  • 随着CNN向输出层沿伸,滤波器的数量会增加(最初是64,然后是128,然后是256):增长是有意义的,因为底层特征通常很少,但是有很多不同的方法可以将它们组合成更高层次的特征。通常的做法是在每个池化层之后将滤波器的数量加倍:由于池化层将每个空间维度除以2,所以能负担得起对下一层特征图数量加倍而不必担心参数数量、内存使用量或计算量的暴增
  • 接下来是全连接的网络,该网络由两个隐藏层的密集层和一个密集输出层组成。在输入密集层之前,需要将输入展平,因为密集网络需要每个实例的一维特征阵列。还添加了两个dropout层,每层的dropout率均为50%,以减少过拟合的情况

该CNN在测试集上达到91%以上,它不是最好的,但是也相当不错。

多年来,已经开发了这种基本架构的变体,导致该领域取得了惊人的进步。衡量这一进展的好方法是在竞赛中的错误率。

LeNet-5

LeNet-5架构可能是最广为人知的CNN架构。它是由Yann LeCun于1998年创建的,已被广泛用于手写数字识别(MNIST)

类型特征图大小内核大小步幅激活函数
Out全连接-10--RBF
F6全连接-84--tanh
C5卷积120$1\times1$$5\times5$1tanh
S4平均池化16$5\times5$$2\times2$2tanh
C3卷积16$10\times10$$5\times5$1tanh
S2平均池化6$14\times14$$2\times2$2tanh
C1卷积6$28\times28$$5\times5$1tanh
In输入1$32\times32$---
  • MNIST图像为$28\times28$像素,但是将其填充为$32\times32$像素并在送入网络之前进行了归一化。网络的其余部分不使用任何填充,这就是随着网络延展而尺寸不断缩小的原因
  • 平均池化层比一般的池化层要复杂一些:每个神经元计算其输入的平均值,然后将结果乘以可学习的系数(每个特征图一个),并添加一个可学习的偏置项(同样每个特征图一个),最后应用激活函数
  • C3特征图中的大多数神经元仅连接到了在S2特征图中的三个或者四个神经元(而不是S2特征图中的所有6个)
  • 输出层:每个神经元的输出是输入向量和权重向量之间的欧几里得距离的平方,而不是计算输入向量和权重向量的矩阵乘法。每个输入测量图像属于特定数字类别的程度。交叉熵成本函数现在是首选,因为它对不良预测的惩罚更大,产生更大的梯度并收敛更快

AlexNet

AlexNet CNN架构在2012年的ImageNet ILSVRC挑战赛中大获全胜:它的前五位错误率是17%,而第2位的错误率是26%。它是由Alex Krizhevsky、Ilya Sutskever和Geoffrey Hinton开发的,与LeNet-5相似,只是更大更深,它是第一个将卷积层直接堆叠在一起的方法,而不是将池化层堆叠在每个卷积层之上

类型特征图大小内核步幅填充激活
Out全连接-1000---Softmax
F10全连接-4096---ReLU
F9全连接-4096---ReLU
S8最大池化256$6\times6$$3\times3$2valid-
C7卷积256$13\times13$$3\times3$1sameReLU
C6卷积384$13\times13$$3\times3$1sameReLU
C5卷积384$13\times13$$3\times3$1sameReLU
S4最大池化256$13\times13$$3\times3$2valid-
C3卷积256$27\times27$$5\times5$1sameReLU
S2最大池化96$27\times27$$3\times3$2valid-
C1卷积96$55\times55$$11\times11$4validReLU
In输入3(RGB)$227\times227$----

为了减少过拟合,作者使用了两种正则化技术。首先,他们在训练期间对F9层和F10层的输出使用了dropout率为50%的dropout技术;其次,他们通过随机变换图像的各种偏移量、水平翻转及更改亮度条件来执行数据增强

数据增强

数据增强通过生成每个训练实例的许多变体来人为地增加训练集的大小。这减少了过拟合,使之成为一种正则化技术。生成的实例应尽可能逼真:理想情况下,给定来自增强训练集的图像,人类应该不能区分它是否被增强。简单地添加白噪声将无济于事。修改应该是可以学习的(不是白噪声)。

例如,可以将训练集(训练集中的图片数量各不相同)中的每张图片稍微移动、旋转和调整大小,将生成的图片添加到训练集中。这迫使模型能更容忍图片中物体的位置、方向和大小的变化。对于更能容忍不同光照条件的模型,可以类似地生成许多具有各种对比度的图像。通常,还可以水平翻转图片(文本和其他非对称物体除外)。通过组合这些变换,可以大大增加训练集的大小

AlexNet还在层C1和C3的ReLU之后立即使用归一化步骤,称为局部响应归一化(LRN):最强激活的神经元会抑制位于相邻特征图中相同位置的其他神经元(在生物学神经元中已观察到这种竞争性激活)。这鼓励不同的特征图专业化,将它们分开,迫使它们探索更广泛的特征,从而最终改善泛化能力。

$$ b_i=a_i\left(k+a\sum_{j=j_{low}}^{j_{high}}a_j^2\right)^{-\beta} $$

其中:

$$ j_{high}=\min\left(i+\frac r2,f_n-1\right) $$

$$ j_{low}=\max\left(0,i-\frac r2\right) $$

  • $b_i$是位于特征图$i$中位于第$u$行和$v$列的神经元归一化输出(此公式中仅考虑位于此行此列的神经元,因此未显示$u$和$v$
  • $a_i$是ReLU之后但未归一化之前该神经元的激活
  • $k,\alpha,\beta$和$r$是超参数。$k$称为偏置,$r$称为深度半径
  • $f_n$是特征图的数量

例如,如果r=2且神经元具有很强的激活作用,它将抑制特征图中紧接着其自身上方和下方的神经元的激活

在AlexNet中,超参数的设置如下:$r=2,\alpha=0.00002,\beta=0.75,k=1$。这个步骤可以使用tf.nn.local_response_normalization()函数来实现(若要在Keras模型中使用它,需要使用Lambda层来包装)

GoogLeNet

GoogLeNet架构由Google研究院的Christian Szegedy等人开发。它将前5名的错误率降低到7%以下而赢得了2014年的ILSVRC讨战。这种出色的性能很大程度上是由于该网络比以前的CNN更深。称为盗梦空间(inception)模块的子网能使GoogLeNet比以前的架构更有效地使用参数:GoogLeNet实际上具有AlexNet十分之一的参数

为什么Inception模块具有带$1\times1$的卷积层。这些层肯定不会识别任何特征,因为它们一次只能看到一个像素。

  • 尽管它们无法识别空间特征,但是它们可以识别沿深度维度的特征
  • 它们输出比输入更少的特征图,因此它们充当了瓶颈层,这意味着它们降低了维度。这减少了计算量和参数数量,加快了训练速度,并提高了泛化能力
  • 每对卷积层([1X1,3X3]和[1X1,5X5])就像一个强大的卷积层,能够识别更复杂的模式。实际上这对卷积层不是在整个图像上扫描简单的线性分类器,而是在整个图像上扫描了两层神经网络

Inception模块能够输出以各种比例尺寸识别的复杂模式的特征图

VGGNet

ILSVRC 2014挑战赛的亚军是VGGNet,由牛津大学视觉几何组(VGG)研究实验室的Karen Simonyan和Andrew Zisserman开发。它具有非常简单和经典的架构,具有2或3个卷积层和一个池化层,然后又有2或3个卷积层和一个池化层,以此类推(总共达到16或19个卷积层,具体取决于VGG变体),以及最终的有2个隐藏层的密集网络和输出层。它仅使用$3\times3$滤波器,但是用了许多滤波器

ResNet

何凯明等使用残差网络(或ResNet)赢得了ILSVRC 2015挑战赛,其前5名的错误率低于3.6%。获胜的变体使用了由152层组成的非常深的CNN(其他变体有34、50和101层)。它证实了一个趋势:模型变得越来越深,参数越来越少。能够训练这种深层网络关键是跳过连接(也成为快捷连接):馈入层的信号也添加到位于堆栈上方的层的输出中。

在训练神经网络时,目标是使其称为目标函数$h(x)$的模型。如果将输入$x$添加到网络的输出(即添加跳过连接),则网络将被迫建模$f(x)=h(x)+x$而不是$h(x)$,这称为残差学习

初始化常规神经网络时,其权重接近零,因此网络仅输出接近零的值。如果添加跳过连接,则生成的网络仅输出其输入的副本。

如果添加许多跳过连接,即使几层还没有开始学习,网络也可以开始取得进展。借助跳过连接,信号可以轻松地在整个网络中传播。深度残差网络可以看作是残差单元(RU)的堆栈,其中每个残差单元都是具有跳过连接的小型神经网络

Xcepion

GoogLeNet架构的另一种值得注意的变体:Xception(代表Extreme Inception)有Franois Chollet于2016年提出,它在很大的视觉任务(3.5亿张图像和17000个类别)明显优于Inception-v3.就像Inception-v4一样,它融合了GoogLeNet和ResNet的思想,但是它用称为深度学习方向可分离卷积层的特殊类型替换了inception模块。这些层以前曾在某些CNN架构中使用过,但它们并不像在Xception架构那样重要。虽然常规卷积层使用的滤波器试图同时识别空间模式和跨通道模式,但可分离的的卷积层的强烈假设是空间模式和跨通道模式可以单独建模。因此它由两部分组成:第一部分为每个输入特征图应用一个空间滤波器,然后第二部分专门寻找跨通道模式——它是具有$1\times1$滤波器的常规卷积层

SeNet

ILSVRC 2017挑战赛获胜的架构是Squeeze-and-Excitation Network(SENet)。该架构拓展了现有架构(例如inception网络和ResNets),并提高了它们的性能。这使得SENet以2.25%的前5名错误率赢得了比赛。inception和ResNet的拓展版本分别成为SE-Inception和SE-ResNet。SENet的增强来自于这样一个事实——SENet向原始架构中的每个单元(即每个inception模块或每个残差单元)添加了一个称为SE块的小型神经网络

SE块分析其连接的单元的输出,仅专注于深度维度(它不关心任何空间模式),并了解哪一些特征通常最活跃。然后,它使用此信息重新校准特征图。一个SE块可能会了解到嘴、鼻子和眼睛通常在图片中同时出现:如果看到嘴巴和鼻子,还应该看到眼睛。因此,如果该块在嘴和鼻子特征途中看到强烈的激活,而在眼睛特征图中只有轻微的激活,则它将增强眼睛特征图(更准确地说,它将降低无关的特征图)。如果眼睛和其他东西有点混淆,则特征图重新校准有助于解决不确定性

SE块仅有三层组成:全局平均池化层、使用ReLU激活函数的隐藏密集层和使用sigmoid激活函数的密集输出层