MobileNetV3详细原理（含torch源码）

文章列表

作者：爱笑的男孩。

个人简介：打工人。

持续分享：机器学习、深度学习、python相关内容、日常BUG解决方法及Windows&Linux实践小技巧。

如发现文章有误，麻烦请指出，我会及时去纠正。有其他需要可以私信我或者发我邮箱:zhilong666@foxmail.com

MobileNetV3原理

MobileNetV3创新点

MobileNetV3对比MobileNetV2

MobileNetV3源码（torch版）

MobileNetV3-large

MobileNetV3-small

训练10个epoch的效果

MobileNetV3原理

MobileNetV3是Google在2019年提出的一款深度神经网络模型，其在移动端图像分类、目标检测、语义分割等任务上均取得了优秀的表现。MobileNetV3采用了很多新的技术，包括针对通道注意力的Squeeze-and-Excitation模块、NAS搜索方法等，这些方法都有利于进一步提升网络的性能。

MobileNetV3的整体架构基本沿用了MobileNetV2的设计，采用了轻量级的深度可分离卷积和残差块等结构，依然是由多个模块组成，但是每个模块得到了优化和升级，包括瓶颈结构、SE模块和NL模块。

首先是瓶颈结构，MobileNetV3中的瓶颈结构和MobileNetV2类似，但是在MobileNetV3中，每个瓶颈结构加入了一个关键的组件，即非线性激活函数。在MobileNetV2中，瓶颈中的Max-Pooling层只是一种简单的预处理方式，而MobileNetV3中引入了ReLU6非线性激活函数来激活瓶颈层，从而能够更好地提升模型的非线性表达能力。

其次是Squeeze-and-Excitation模块，MobileNetV3引入了Squeeze-and-Excitation(SE)模块来有效地增强网络中的通道交互和上下文信息的传递。SE模块通过自适应地学习通道的处理权重，将通道之间的相关性进行特异性加强，进而提高特征图的鉴别力和语义表达能力。

最后是NAS搜索方法，MobileNetV3可以采用神经结构搜索(NAS)方法，该方法能够根据数据集的特点，自动得到一种最优的网络架构。在MobileNetV3中，采用了一种基于RL策略的NAS搜索方法，使用强化学习策略来搜索网络的结构，从而能够自适应地预测最优网络架构。

综合来说，MobileNetV3是一种高效、准确和灵活的深度学习模型，它适用于各种移动设备上的图像分类和识别任务。

MobileNetV3创新点

MobileNetV3的主要创新在于使用了两个重要的组件：候选块和内积激活函数。

1.候选块

MobileNetV3使用候选块来替代基于固定卷积单元的基础框架。候选块是一个由若干个卷积层、批标准化层和激活函数组成的一个基本构建单元。候选块的输入和输出都是四维张量(n, h, w, c)，其中n表示batch size，h和w表示图像的高度和宽度，c表示通道数。

MobileNetV3提出了三种不同的候选块：小型、中型和大型。这三个候选块的结构相同，但是它们的通道数和宽度倍增的比例不同。这样可以在不增加运算量的情况下，提高模型的表现力。

2.内积激活函数

MobileNetV3使用了内积激活函数。内积激活函数将输入的特征图和可学习的参数进行内积运算，得到一个标量，然后将该标量通过非线性激活函数进行映射，从而得到特征图的输出。

内积激活函数相比传统的卷积激活函数具有如下优点：

降低了计算量。内积运算可以使用点乘或卷积来实现，比起传统的卷积操作，计算量更小。
降低了内存占用。内积激活函数可以将输入特征图的通道数降低到1，从而减少存储和传输开销。
提高了模型表现力。内积激活函数可以使模型在特定的任务上获得更好的表现力。

MobileNetV3在实验中证明了其在ImageNet上图像分类任务上的有效性，同时也在COCO数据集上进行了检测和分割等任务的实验，取得了较好的结果。

补充：

Adaptive Inverted Residual Block（AIR Block）：MobileNetV3首先引入了AIR Block来替代MobileNetV2中的Inverted Residual Block（IR Block）。AIR Block的特点是具有自适应的非线性结构，可以提高模型的表达能力和精度。同时，AIR Block引入了多步卷积结构，可以减少模型参数量和计算量，从而提升模型的运行速度。
激活函数优化：MobileNetV3中对ReLU6进行了优化，引入了Swish激活函数，可以有效提高模型的精度。 Swish激活函数是一种非常流行的激活函数，其具有类似于sigmoid激活函数的平滑曲线，可以在容易梯度消失的区域内提供更好的梯度。同时，Swish激活函数可以保持较高的计算效率。
Hard-Sigmoid门控：MobileNetV3中提出了一种新的门控机制，即Hard-Sigmoid门控。 Hard-Sigmoid门控可以在保证计算效率的前提下，提高模型的表达能力，从而提高模型的精度。
Network Architecture Search（NAS）技术：MobileNetV3采用了NAS技术进行模型搜索，可以自动化地发现适合移动端场景的网络结构，并自动优化其超参数。这一方法可以有效地降低模型的人工设计成本，并提高模型的精度和效率。

MobileNetV3在神经网络结构设计、激活函数优化、门控机制提升以及NAS技术应用等方面都有比较明显的创新点，其提高了移动端神经网络模型的表达能力和计算效率，具有非常重要的应用价值。

MobileNetV3对比MobileNetV2

MobileNetV3相较于MobileNetV2，主要有以下改进和优化：

更高的准确率：MobileNetV3在ImageNet数据集上的Top-1准确率达到了75.2%，比MobileNetV2的72.0%要高。
更快的推理速度：MobileNetV3通过增加Squeeze-and-Excitation模块和hard-swish非线性激活函数，提高了模型的计算效率，加快了推理速度。
更少的参数和计算量：MobileNetV3在保持准确率不变的情况下，参数数量和计算量比MobileNetV2都要少。
支持多种模型结构：MobileNetV3除了提供标准版模型外，还提供了Small模型和Large模型，可以根据不同的场景和需求选择合适的模型。
支持自适应网络：MobileNetV3加入了Squeeze-and-Excite模块和自适应网络结构，可以根据输入图像的尺寸和分辨率自适应地调整模型的结构和参数，从而实现更好的模型泛化能力和适应性。
支持权重重要性筛选：MobileNetV3提供了权重重要性筛选工具，可以定量地筛选出对模型性能影响最大的参数，从而实现更高效的模型压缩和优化。

总的来说，MobileNetV3相对于MobileNetV2，在准确率、推理速度、参数数量、计算量、模型结构和适应性等方面都有了显著的提升和改进。

MobileNetV3源码（torch版）

MobileNetV3-large

from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
from torchvision.transforms import transforms
from torch.autograd import Variableimport torch.nn as nn
import torch.nn.functional as F
import torchclass hswish(nn.Module):def __init__(self, inplace=True):super(hswish, self).__init__()self.inplace = inplacedef forward(self, x):f = nn.functional.relu6(x + 3., inplace=self.inplace) / 6.return x * fclass hsigmoid(nn.Module):def __init__(self, inplace=True):super(hsigmoid, self).__init__()self.inplace = inplacedef forward(self, x):f = nn.functional.relu6(x + 3., inplace=self.inplace) / 6.return fclass SeModule(nn.Module):def __init__(self, in_channels, se_ratio=0.25):super(SeModule, self).__init__()self.se_reduce = nn.Conv2d(in_channels, int(in_channels * se_ratio), kernel_size=1, stride=1, padding=0)self.se_expand = nn.Conv2d(int(in_channels * se_ratio), in_channels, kernel_size=1, stride=1, padding=0)def forward(self, x):s = nn.functional.adaptive_avg_pool2d(x, 1)s = self.se_expand(nn.functional.relu(self.se_reduce(s), inplace=True))return x * s.sigmoid()class ConvBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding, groups=1):super(ConvBlock, self).__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, groups=groups, bias=False)self.bn = nn.BatchNorm2d(out_channels)self.act = hswish()def forward(self, x):return self.act(self.bn(self.conv(x)))class SqueezeExcitation(nn.Module):def __init__(self, in_channel, out_channel, reduction=4):super(SqueezeExcitation, self).__init__()self.pool = nn.AdaptiveAvgPool2d(1)self.fc1 = nn.Conv2d(in_channel, out_channel // reduction, kernel_size=1, stride=1)self.relu = nn.ReLU(inplace=True)self.fc2 = nn.Conv2d(out_channel // reduction, out_channel, kernel_size=1, stride=1)self.sigmoid = nn.Sigmoid()def forward(self, x):out = self.pool(x)out = self.fc1(out)out = self.relu(out)out = self.fc2(out)out = self.sigmoid(out)return outclass ResidualBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, use_se=True):super(ResidualBlock, self).__init__()self.conv1 = ConvBlock(in_channels, out_channels, kernel_size, stride, kernel_size//2)self.conv2 = ConvBlock(out_channels, out_channels, kernel_size, 1, kernel_size//2)self.use_se = use_seif use_se:self.se = SqueezeExcitation(out_channels, out_channels)self.shortcut = nn.Sequential()if stride != 1 or in_channels != out_channels:self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),nn.BatchNorm2d(out_channels))def forward(self, x):out = self.conv1(x)out = self.conv2(out)if self.use_se:out = out * self.se(out)out += self.shortcut(x)out = nn.functional.relu(out, inplace=True)return outclass MobileNetV3Large(nn.Module):def __init__(self, num_classes=1000):super(MobileNetV3Large, self).__init__()#self.conv1 = ConvBlock(3, 16, 3, 2, 1)     # 1/2self.bottlenecks = nn.Sequential(ResidualBlock(16, 16, 3, 1, False),ResidualBlock(16, 24, 3, 2, False),     # 1/4ResidualBlock(24, 24, 3, 1, False),ResidualBlock(24, 40, 5, 2, True),      # 1/8ResidualBlock(40, 40, 5, 1, True),ResidualBlock(40, 40, 5, 1, True),ResidualBlock(40, 80, 3, 2, False),     # 1/16ResidualBlock(80, 80, 3, 1, False),ResidualBlock(80, 80, 3, 1, False),ResidualBlock(80, 112, 5, 1, True),ResidualBlock(112, 112, 5, 1, True),ResidualBlock(112, 160, 5, 2, True),    # 1/32ResidualBlock(160, 160, 5, 1, True),ResidualBlock(160, 160, 5, 1, True))self.conv2 = ConvBlock(160, 960, 1, 1, 0)self.pool = nn.AdaptiveAvgPool2d(1)self.fc = nn.Sequential(nn.Linear(960, 1280),nn.BatchNorm1d(1280),nn.Hardswish(inplace=True),nn.Linear(1280, num_classes),)def forward(self, x):out = self.conv1(x)out = self.bottlenecks(out)out = self.conv2(out)out = self.pool(out)out = out.reshape(out.size(0), -1)out = self.fc(out)return outif __name__ == '__main__':train_data = CIFAR10('cifar', train=True, transform=transforms.ToTensor())data = DataLoader(train_data, batch_size=148, shuffle=True)device = torch.device('cuda')net = MobileNetV3Large(num_classes=10).to(device)print(net)cross = nn.CrossEntropyLoss().to(device)optimizer = torch.optim.Adam(net.parameters(), 0.001)for epoch in range(10):for img, label in data:img = Variable(img).to(device)label = Variable(label).to(device)output = net.forward(img)loss = cross(output, label)loss.backward()optimizer.zero_grad()optimizer.step()pre = torch.argmax(output, 1)num = (pre == label).sum().item()acc = num / img.shape[0]print("epoch:", epoch + 1)print("loss:", loss.item())print("Accuracy:", acc)

MobileNetV3-small

from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
from torchvision.transforms import transforms
from torch.autograd import Variableimport torch.nn as nn
import torch.nn.functional as F
import torchclass hswish(nn.Module):def __init__(self, inplace=True):super(hswish, self).__init__()self.inplace = inplacedef forward(self, x):f = nn.functional.relu6(x + 3., inplace=self.inplace) / 6.return x * fclass hsigmoid(nn.Module):def __init__(self, inplace=True):super(hsigmoid, self).__init__()self.inplace = inplacedef forward(self, x):f = nn.functional.relu6(x + 3., inplace=self.inplace) / 6.return fclass SeModule(nn.Module):def __init__(self, in_channels, se_ratio=0.25):super(SeModule, self).__init__()self.se_reduce = nn.Conv2d(in_channels, int(in_channels * se_ratio), kernel_size=1, stride=1, padding=0)self.se_expand = nn.Conv2d(int(in_channels * se_ratio), in_channels, kernel_size=1, stride=1, padding=0)def forward(self, x):s = nn.functional.adaptive_avg_pool2d(x, 1)s = self.se_expand(nn.functional.relu(self.se_reduce(s), inplace=True))return x * s.sigmoid()class ConvBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding, groups=1):super(ConvBlock, self).__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, groups=groups, bias=False)self.bn = nn.BatchNorm2d(out_channels)self.act = hswish()def forward(self, x):return self.act(self.bn(self.conv(x)))class SqueezeExcitation(nn.Module):def __init__(self, in_channel, out_channel, reduction=4):super(SqueezeExcitation, self).__init__()self.pool = nn.AdaptiveAvgPool2d(1)self.fc1 = nn.Conv2d(in_channel, out_channel // reduction, kernel_size=1, stride=1)self.relu = nn.ReLU(inplace=True)self.fc2 = nn.Conv2d(out_channel // reduction, out_channel, kernel_size=1, stride=1)self.sigmoid = nn.Sigmoid()def forward(self, x):out = self.pool(x)out = self.fc1(out)out = self.relu(out)out = self.fc2(out)out = self.sigmoid(out)return outclass ResidualBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, use_se=True):super(ResidualBlock, self).__init__()self.conv1 = ConvBlock(in_channels, out_channels, kernel_size, stride, kernel_size//2)self.conv2 = ConvBlock(out_channels, out_channels, kernel_size, 1, kernel_size//2)self.use_se = use_seif use_se:self.se = SqueezeExcitation(out_channels, out_channels)self.shortcut = nn.Sequential()if stride != 1 or in_channels != out_channels:self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),nn.BatchNorm2d(out_channels))def forward(self, x):out = self.conv1(x)out = self.conv2(out)if self.use_se:out = out * self.se(out)out += self.shortcut(x)out = nn.functional.relu(out, inplace=True)return outclass MobileNetV3Small(nn.Module):def __init__(self, num_classes=1000):super(MobileNetV3Small, self).__init__()self.conv1 = ConvBlock(3, 16, 3, 2, 1)     # 1/2self.bottlenecks = nn.Sequential(ResidualBlock(16, 16, 3, 2, False),     # 1/4ResidualBlock(16, 72, 3, 2, False),     # 1/8ResidualBlock(72, 72, 3, 1, False),ResidualBlock(72, 72, 3, 1, True),ResidualBlock(72, 96, 3, 2, True),      # 1/16ResidualBlock(96, 96, 3, 1, True),ResidualBlock(96, 96, 3, 1, True),ResidualBlock(96, 240, 5, 2, True),     # 1/32ResidualBlock(240, 240, 5, 1, True),ResidualBlock(240, 240, 5, 1, True),ResidualBlock(240, 480, 5, 1, True),ResidualBlock(480, 480, 5, 1, True),ResidualBlock(480, 480, 5, 1, True),)self.conv2 = ConvBlock(480, 576, 1, 1, 0, groups=2)self.conv3 = nn.Conv2d(576, 1024, kernel_size=1, stride=1, padding=0, bias=False)self.bn = nn.BatchNorm2d(1024)self.act = hswish()self.pool = nn.AdaptiveAvgPool2d(1)self.fc = nn.Linear(1024, num_classes)def forward(self, x):out = self.conv1(x)out = self.bottlenecks(out)out = self.conv2(out)out = self.conv3(out)out = self.bn(out)out = self.act(out)out = self.pool(out)out = out.reshape(out.size(0), -1)out = self.fc(out)return outif __name__ == '__main__':train_data = CIFAR10('cifar', train=True, transform=transforms.ToTensor())data = DataLoader(train_data, batch_size=148, shuffle=True)device = torch.device("cuda")net = MobileNetV3Small(num_classes=10).to(device)print(net)cross = nn.CrossEntropyLoss().to(device)optimizer = torch.optim.Adam(net.parameters(), 0.01)for epoch in range(10):for img, label in data:img = Variable(img).to(device)label = Variable(label).to(device)output = net.forward(img)loss = cross(output, label)loss.backward()optimizer.zero_grad()optimizer.step()pre = torch.argmax(output, 1)num = (pre == label).sum().item()acc = num / img.shape[0]print("epoch:", epoch + 1)print("loss:", loss.item())print("Accuracy:", acc)

MobileNetV3详细原理（含torch源码）

MobileNetV3原理

MobileNetV3创新点

补充：

MobileNetV3对比MobileNetV2

MobileNetV3源码（torch版）

MobileNetV3-large

MobileNetV3-small

训练10个epoch的效果

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

MobileNetV3详细原理（含torch源码）

MobileNetV3原理

MobileNetV3创新点

补充：

MobileNetV3对比MobileNetV2

MobileNetV3源码（torch版）

MobileNetV3-large

MobileNetV3-small

训练10个epoch的效果

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签