整理:AI算法与图像处理
CVPR2022论文和代码整理:https://github.com/DWCTOD/CVPR2022-Papers-with-Code-Demo
ECCV2022论文和代码整理:https://github.com/DWCTOD/ECCV2022-Papers-with-Code-Demo
ECCV2022 | XMem: 高质量长期视频分割!
效果超群!
标题:XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
论文:https://arxiv.org/pdf/2207.07115.pdf
代码:https://github.com/hkchengrex/XMem
摘要:
我们提出了 XMem,这是一种用于长视频的视频对象分割架构,具有统一的特征内存存储,受 Atkinson-Shiffrin 内存模型的启发。先前关于视频对象分割的工作通常只使用一种类型的特征记忆。对于超过一分钟的视频,单个特征内存模型将内存消耗和准确性紧密联系在一起。相比之下,遵循 Atkinson-Shiffrin 模型,我们开发了一种架构,该架构包含多个独立但深度连接的特征记忆存储:快速更新的感觉记忆、高分辨率工作记忆和紧凑的持续长期记忆。至关重要的是,我们开发了一种记忆增强算法,该算法通常将积极使用的工作记忆元素整合到长期记忆中,从而避免记忆爆炸并最大限度地减少长期预测的性能衰减。结合新的内存读取机制,XMem 在长视频数据集上的性能大大超过了最先进的性能,同时在短视频上与最先进的方法(不适用于长视频)相当数据集。
PoserNet: Refining Relative Camera Poses Exploiting Object Detections
Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos
RCLane: Relay Chain Prediction for Lane Detection
Rethinking IoU-based Optimization for Single-stage 3D Object Detection
Deep Semantic Statistics Matching (D2SM) Denoising Network
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform
Action Quality Assessment with Temporal Parsing Transformer
Image Super-Resolution with Deep Dictionary
NDF: Neural Deformable Fields for Dynamic Human Modelling
Self-Supervision Can Be a Good Few-Shot Learner
Single Stage Virtual Try-on via Deformable Attention Flows
FedX: Unsupervised Federated Learning with Cross Knowledge Distillation
Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution
What Matters for 3D Scene Flow Network
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild
MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views
Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation
Box-supervised Instance Segmentation with Level Set Evolution
ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation
Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation
SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Prior-Guided Adversarial Initialization for Fast Adversarial Training
Self-Supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach
Prior Knowledge Guided Unsupervised Domain Adaptation
Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective
Balanced Contrastive Learning for Long-Tailed Visual Recognition