> 文章列表 > [CVPR 2020] Regularizing Class-Wise Predictions via Self-Knowledge Distillation

[CVPR 2020] Regularizing Class-Wise Predictions via Self-Knowledge Distillation

文章列表

[CVPR 2020] Regularizing Class-Wise Predictions via Self-Knowledge Distillation

Contents

Introduction
Class-wise self-knowledge distillation (CS-KD)
- Class-wise regularization
- Effects of class-wise regularization
Experiments
- Classiﬁcation accuracy
References

Introduction

为了缓解模型过拟合，作者提出 Class-wise self-knowledge distillation (CS-KD)，用同一类别的其他样本的预测类别概率去进行自蒸馏，使得模型输出更有意义和更加一致的预测结果

Class-wise self-knowledge distillation (CS-KD)

Class-wise regularization

在这里插入图片描述

class-wise regularization loss. 使得属于同一类别样本的预测概率分布彼此接近，相当于对模型自身的 dark knowledge (i.e., the knowledge on wrong predictions) 进行蒸馏
其中， $x,x′\\mathbf x,\\mathbf x'$ 为属于同一类别的不同样本， $\\mid \\mathbf{x} ; \\theta, T)=\\frac{\\exp \\left(f_y(\\mathbf{x} ; \\theta) / T\\right)}{\\sum_{i=1}^C \\exp \\left(f_i(\\mathbf{x} ; \\theta) / T\\right)}$ ， $T$ 为温度参数；注意到， $θ~\\tilde \\theta$ 为 ﬁxed copy of the parameters $θ\\theta$ ，梯度不会通过 $θ~\\tilde \\theta$ 回传到模型参数，从而避免 model collapse (cf. Miyaeto et al.)
total training loss

在这里插入图片描述

Effects of class-wise regularization

Reducing the intra-class variations.
Preventing overconﬁdent predictions. CS-KD 通过将同一类别其他样本的预测类别分布作为软标签来避免 overconﬁdent predictions，这比一般的 label-smoothing 方法生成的软标签更真实 (more ‘realistic’)

Experiments

Classiﬁcation accuracy

Comparison with output regularization methods.
Comparison with self-distillation methods.
Evaluation on large-scale datasets.
Compatibility with other regularization methods.
Ablation study.
(1) Feature embedding analysis.
(2) Hierarchical image classification.
Calibration effects.

References

Yun, Sukmin, et al. “Regularizing class-wise predictions via self-knowledge distillation.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
code: https://github.com/alinlab/cs-kd