nn.Conv1d简单理解
1. 官方文档的定义
In the simplest case, the output value of the layer with input size
(N,Cin,L)(N, C_{\\text{in}}, L)(N,Cin,L) and output (N,Cout,Lout)(N, C_{\\text{out}}, L_{\\text{out}})(N,Cout,Lout) can be precisely described as:
out(Ni,Coutj)=bias(Coutj)+∑k=0Cin−1weight(Coutj,k)⋆input(Ni,k)\\text{out}(N_i, C_{\\text{out}_j}) = \\text{bias}(C_{\\text{out}_j}) + \\sum_{k = 0}^{C_{in} - 1} \\text{weight}(C_{\\text{out}_j}, k) \\star \\text{input}(N_i, k) out(Ni,Coutj)=bias(Coutj)+k=0∑Cin−1weight(Coutj,k)⋆input(Ni,k)
where ⋆\\star⋆ is the valid cross-correlation
_ operator,
N
is a batch size, C
denotes a number of channels,
L
is a length of signal sequence.
$$
This module supports :ref:`TensorFloat32<tf32_on_ampere>`.* :attr:`stride` controls the stride for the cross-correlation, a singlenumber or a one-element tuple.* :attr:`padding` controls the amount of implicit zero-paddings on both sidesfor :attr:`padding` number of points.* :attr:`dilation` controls the spacing between the kernel points; alsoknown as the à trous algorithm. It is harder to describe, but this `link`_has a nice visualization of what :attr:`dilation` does.* :attr:`groups` controls the connections between inputs and outputs.:attr:`in_channels` and :attr:`out_channels` must both be divisible by:attr:`groups`. For example,* At groups=1, all inputs are convolved to all outputs.* At groups=2, the operation becomes equivalent to having two convlayers side by side, each seeing half the input channels,and producing half the output channels, and both subsequentlyconcatenated.* At groups= :attr:`in_channels`, each input channel is convolved withits own set of filters,of size
⌊out_channelsin_channels⌋\\left\\lfloor\\frac{out\\_channels}{in\\_channels}\\right\\rfloor⌊in_channelsout_channels⌋
1.1 参数解释
- Input: (N,Cin,Lin)(N, C_{in}, L_{in})(N,Cin,Lin)
- Output: (N,Cout,Lout)(N, C_{out}, L_{out})(N,Cout,Lout) where
其中如上文所述:
-
N
代表 batch size, -
C
代表channels, 通道的数量。 在序列中,代表每个列向量的维度。
CinC_{in}Cin 输入序列中,每个列向量的编码维度。
CoutC_{out}Cout 输出序列中,期待每个列向量的编码维度。 -
L
代表 sequence 序列的长度,即序列中有多少个列向量。
LinL_{in}Lin 输入序列中, 包含多少个列向量。
LoutL_{out}Lout 输出序列中,包含多少个列向量。
输出如下所示:
Lout=⌊Lin+2×padding−dilation×(kernel_size−1)−1stride+1⌋L_{out} = \\left\\lfloor\\frac{L_{in} + 2 \\times \\text{padding} - \\text{dilation} \\times (\\text{kernel\\_size} - 1) - 1}{\\text{stride}} + 1\\right\\rfloor Lout=⌊strideLin+2×padding−dilation×(kernel_size−1)−1+1⌋
1.2 运行举例
其中padding 默认0, dilation 默认1, groups 默认1,
计算公式,按照上文计算。
import torch.nn as nnm = nn.Conv1d(16,33, 3, stride =2)
input = torch.rand(20, 16, 50)output = m(input)print(output.shape)
torch.Size([20, 33, 24])