CVPR2023论文整理
文章目录
- CVPR2023
-
- 一. Vision and Language / Multimodal
CVPR2023
根据官方信息统计,今年共收到 9155 份提交,比去年增加了 12%,创下新纪录,今年接收了 2360 篇论文,接收率为 25.78%。作为对比,去年有 8100 多篇有效投稿,大会接收了 2067 篇,接收率为 25%。
https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers
现在根据关键词,对自己感兴趣的方向进行规整以及分类(有筛选)
一. Vision and Language / Multimodal
论文名 | 简介 |
---|---|
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles | |
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training | |
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning | |
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks | |
CREPE: Can Vision-Language Foundation Models Reason Compositionally? | |
Task Residual for Tuning Vision-Language Models | |
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A Self-Train on Unlabeled Images! | |
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks | |
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining | |
Open-set Fine-grained Retrieval via Prompting Vision-Language Evaluator | |
Image as a Foreign Language BEiT Pretraining for Vision and Vision-Language Tasks | |
FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training | |
Accelerating Vision-Language Pretraining with Free Language Modeling | |
Leveraging per Image-Token Consistency for Vision-Language Pre-training | |
Position-guided Text Prompt for Vision-Language Pre-training | |
IFSeg: Image-free Semantic Segmentation via Vision-Language Model | |
Enhanced Multimodal Representation Learning with Cross-modal KD | |
Efficient Multimodal Fusion via Interactive Prompting | |
Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data | |
Revisiting Multimodal Representation in Contrastive Learning From Patch and Token embeddings to Finite Discrete Tokens | |
Align and Attend: Multimodal Summarization with Dual Contrastive Losses | |
Multimodal Prompting with Missing Modalities for Visual Recognition |