cs.CV 方向,今日共计31篇
Transformer(2篇)
【1】 The channel-spatial attention-based vision transformer network for automated, accurate prediction of crop nitrogen status from UAV imagery 标题:基于通道空间注意力的视觉转换网络,用于自动、准确地从无人机图像中预测作物氮素状况 链接:https://arxiv.org/abs/2111.06839
作者:Xin Zhang,Liangxiu Han,Tam Sobeih,Lewis Lappin,Mark Lee,Andew Howard,Aron Kisdi 摘要:农民通常使用氮肥来提高作物产量。目前,农民往往在某些地点或时间点过量施用氮肥,因为他们没有高分辨率的作物氮素状况数据。氮的利用效率可能很低,剩余的氮流失到环境中,导致生产成本高和环境污染。准确、及时地估计作物中的氮素状况对于提高作物系统的经济和环境可持续性至关重要。传统的基于实验室组织分析的估算植物氮素状况的方法耗时且具有破坏性。遥感和机器学习的最新进展显示了以非破坏性方式解决上述挑战的希望。我们提出了一种新的深度学习框架:基于通道空间注意的视觉转换器(CSVT),用于从麦田无人机采集的大型图像中估计作物N状态。与现有工作不同,拟议的CSVT引入了通道注意块(CAB)和空间交互块(SIB),允许从无人机数字航空图像中捕获空间和通道特征的非线性特征,以便准确预测小麦作物的氮状况。此外,由于获取标记数据耗时且成本高昂,因此引入了局部到全局的自监督学习来使用大量未标记数据对CSVT进行预训练。建议的CSVT已与最先进的模型进行了比较,并在测试和独立数据集上进行了测试和验证。该方法具有较高的精度(0.96),具有良好的通用性和重复性。 摘要:Nitrogen (N) fertiliser is routinely applied by farmers to increase crop yields. At present, farmers often over-apply N fertilizer in some locations or timepoints because they do not have high-resolution crop N status data. N-use efficiency can be low, with the remaining N lost to the environment, resulting in high production costs and environmental pollution. Accurate and timely estimation of N status in crops is crucial to improving cropping systems' economic and environmental sustainability. The conventional approaches based on tissue analysis in the laboratory for estimating N status in plants are time consuming and destructive. Recent advances in remote sensing and machine learning have shown promise in addressing the aforementioned challenges in a non-destructive way. We propose a novel deep learning framework: a channel-spatial attention-based vision transformer (CSVT) for estimating crop N status from large images collected from a UAV in a wheat field. Unlike the existing works, the proposed CSVT introduces a Channel Attention Block (CAB) and a Spatial Interaction Block (SIB), which allows capturing nonlinear characteristics of spatial-wise and channel-wise features from UAV digital aerial imagery, for accurate N status prediction in wheat crops. Moreover, since acquiring labeled data is time consuming and costly, local-to-global self-supervised learning is introduced to pre-train the CSVT with extensive unlabelled data. The proposed CSVT has been compared with the state-of-the-art models, tested and validated on both testing and independent datasets. The proposed approach achieved high accuracy (0.96) with good generalizability and reproducibility for wheat N status estimation.
【2】 Transformer-based Image Compression 标题:基于变换的图像压缩 链接:https://arxiv.org/abs/2111.06707
作者:Ming Lu,Peiyao Guo,Huiqing Shi,Chuntong Cao,Zhan Ma 机构:†Nanjing University, ‡Jiangsu Longyuan Zhenhua Marine Engineering Co. 摘要:提出了一种基于变换器的图像压缩(TIC)方法,该方法利用标准变分自动编码器(VAE)结构和成对的主/超编码器-解码器。主编码器和超编码器均由一系列神经变换单元(NTU)组成,用于分析和聚合重要信息,以便更紧凑地表示输入图像,而解码器镜像编码器端操作,以从压缩比特流生成像素域图像重建。每个NTU由一个SWNTransformer块(STB)和一个卷积层(Conv)组成,以最好地嵌入远程和短期信息;同时,设计了一个偶然注意模块(CAM),利用超先验和自回归先验对潜在特征进行自适应上下文建模。TIC可与最先进的方法相媲美,包括基于深度卷积神经网络(CNN)的学习图像编码(LIC)方法和最近批准的多功能视频编码(VVC)标准的基于手工规则的帧内轮廓,并且需要的模型参数少得多,例如,与领先性能的LIC相比,最多可减少45%。 摘要:A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders. Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders mirror the encoder-side operations to generate pixel-domain image reconstruction from the compressed bitstream. Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module (CAM) is devised for adaptive context modeling of latent features to utilize both hyper and autoregressive priors. The TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard, and requires much less model parameters, e.g., up to 45% reduction to leading-performance LIC.
检测相关(5篇)
【1】 Multimodal Virtual Point 3D Detection 标题:多模态虚拟点三维检测 链接:https://arxiv.org/abs/2111.06881
作者:Tianwei Yin,Xingyi Zhou,Philipp Krähenbühl 机构:UT Austin 备注:NeurIPS 2021, code available at this https URL 摘要:基于激光雷达的传感驱动当前的自主车辆。尽管进展迅速,但目前的激光雷达传感器在分辨率和成本方面仍落后于传统彩色相机20年。对于自动驾驶,这意味着靠近传感器的大型物体很容易看到,但距离较远或较小的物体仅包含一个或两个测量值。这是一个问题,尤其是当这些物体被证明是驾驶危险时。另一方面,这些相同的物体在机载RGB传感器中清晰可见。在这项工作中,我们提出了一种将RGB传感器无缝融合到基于激光雷达的3D识别中的方法。我们的方法采用一组2D检测来生成密集的3D虚拟点,以增强原本稀疏的3D点云。这些虚拟点自然地与任何基于激光雷达的标准3D探测器以及常规激光雷达测量集成在一起。由此产生的多模态检测器简单有效。在大规模nuScenes数据集上的实验结果表明,我们的框架将一个强大的中心点基线提高了6.6MAP,并且优于其他融合方法。代码和更多可视化可在https://tianweiy.github.io/mvp/ 摘要:Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches. Code and more visualizations are available at https://tianweiy.github.io/mvp/
【2】 Sci-Net: a Scale Invariant Model for Building Detection from Aerial Images 标题:SCI-Net:一种尺度不变的航空图像建筑物检测模型 链接:https://arxiv.org/abs/2111.06812
作者:Hasan Nasrallah,Ali J. Ghandour 机构:Lebanese University, Hadath, Lebanon, National Center for Remote Sensing - National Council for Scientific Research (CNRS), Beirut, Lebanon 摘要:建筑物分割是地球观测和航空图像分析领域的一项基本任务。文献中大多数现有的基于深度学习的算法可以应用于固定或窄范围的空间分辨率图像。在实际场景中,用户处理的图像分辨率范围很广,因此,通常需要对给定的航空图像进行重采样,以匹配用于训练深度学习模型的数据集的空间分辨率。然而,这将导致输出分段遮罩的质量严重下降。为了解决这个问题,我们在本研究中提出了一种尺度不变神经网络(Sci-Net),它能够在不同的空间分辨率下分割航空图像中的建筑物。具体地说,我们修改了U-Net体系结构,并将其与密集Atrus空间金字塔池(ASPP)融合,以提取细粒度多尺度表示。我们在开放城市AI数据集上将我们提出的模型的性能与几种最先进的模型进行了比较,结果表明,Sci Net在数据集中所有可用分辨率的性能上都有稳定的提高幅度。 摘要:Buildings' segmentation is a fundamental task in the field of earth observation and aerial imagery analysis. Most existing deep learning based algorithms in the literature can be applied on fixed or narrow-ranged spatial resolution imagery. In practical scenarios, users deal with a wide spectrum of images resolution and thus, often need to resample a given aerial image to match the spatial resolution of the dataset used to train the deep learning model. This however, would result in a severe degradation in the quality of the output segmentation masks. To deal with this issue, we propose in this research a Scale-invariant neural network (Sci-Net) that is able to segment buildings present in aerial images at different spatial resolutions. Specifically, we modified the U-Net architecture and fused it with dense Atrous Spatial Pyramid Pooling (ASPP) to extract fine-grained multi-scale representations. We compared the performance of our proposed model against several state of the art models on the Open Cities AI dataset, and showed that Sci-Net provides a steady improvement margin in performance across all resolutions available in the dataset.
【3】 AlphaRotate: A Rotation Detection Benchmark using TensorFlow 标题:AlphaRotate:一种基于TensorFlow的旋转检测基准 链接:https://arxiv.org/abs/2111.06677
作者:Xue Yang,Yue Zhou,Junchi Yan 机构:Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI, Institute, Shanghai Jiao Tong University, Shanghai, China, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China 备注:7 pages, 1 figure, 1 table 摘要:AlphaRotate是一个开源的Tensorflow基准,用于在各种数据集上执行可伸缩的旋转检测。它目前提供了18种以上流行的旋转检测模型,这些模型都是在一个单一的、有良好文档记录的API下设计的,供从业者和研究人员使用。AlphaRotate将高性能、健壮性、可持续性和可扩展性视为设计的核心概念,所有模型都包含在单元测试、持续集成、代码覆盖、可维护性检查以及可视化监视和分析中。AlphaRotate可以从PyPI安装,并在Apache-2.0许可证下发布。源代码可在https://github.com/yangxue0827/RotationDetection. 摘要:AlphaRotate is an open-source Tensorflow benchmark for performing scalable rotation detection on various datasets. It currently provides more than 18 popular rotation detection models under a single, well-documented API designed for use by both practitioners and researchers. AlphaRotate regards high performance, robustness, sustainability and scalability as the core concept of design, and all models are covered by unit testing, continuous integration, code coverage, maintainability checks, and visual monitoring and analysis. AlphaRotate can be installed from PyPI and is released under the Apache-2.0 License. Source code is available at https://github.com/yangxue0827/RotationDetection.
【4】 Attention Guided Cosine Margin For Overcoming Class-Imbalance in Few-Shot Road Object Detection 标题:注意力引导余弦裕度克服Few-Shot道路目标检测中的类不平衡 链接:https://arxiv.org/abs/2111.06639
作者:Ashutosh Agarwal,Anay Majee,Anbumani Subramanian,Chetan Arora 机构:IIT Delhi, Intel Corporation 备注:8 pages, 4 figures 摘要:Few-Shot目标检测(FSOD)仅在给定少量数据样本的情况下对图像中的目标进行定位和分类。FSOD研究的最新趋势表明采用了度量和元学习技术,这容易导致灾难性遗忘和课堂混乱。为了克服基于度量学习的FSOD技术中的这些缺陷,我们引入了注意力引导的余弦裕度(AGCM),这有助于在对象检测器的分类头中创建更紧密且分离良好的类特定特征簇。我们的新的注意建议融合(APF)模块通过减少共同发生的类之间的类内方差来最小化灾难性遗忘。同时,提出的余弦裕度交叉熵损失增加了混淆类之间的角裕度,以克服已学习(基本)类和新添加(新)类之间的类混淆的挑战。我们在具有挑战性的印度驾驶数据集(IDD)上进行了实验,该数据集与流行的FSOD基准PASCAL-VOC一起呈现了一个真实世界级的不平衡设置。我们的方法优于最先进的(SoTA)方法,在IDD-OS上最多可获得6.4个贴图点,在10次拍摄设置下,在IDD-10分割上最多可获得2.0个贴图点。在PASCAL-VOC数据集上,我们比现有的SoTA方法高出4.9个映射点。 摘要:Few-shot object detection (FSOD) localizes and classifies objects in an image given only a few data samples. Recent trends in FSOD research show the adoption of metric and meta-learning techniques, which are prone to catastrophic forgetting and class confusion. To overcome these pitfalls in metric learning based FSOD techniques, we introduce Attention Guided Cosine Margin (AGCM) that facilitates the creation of tighter and well separated class-specific feature clusters in the classification head of the object detector. Our novel Attentive Proposal Fusion (APF) module minimizes catastrophic forgetting by reducing the intra-class variance among co-occurring classes. At the same time, the proposed Cosine Margin Cross-Entropy loss increases the angular margin between confusing classes to overcome the challenge of class confusion between already learned (base) and newly added (novel) classes. We conduct our experiments on the challenging India Driving Dataset (IDD), which presents a real-world class-imbalanced setting alongside popular FSOD benchmark PASCAL-VOC. Our method outperforms State-of-the-Art (SoTA) approaches by up to 6.4 mAP points on the IDD-OS and up to 2.0 mAP points on the IDD-10 splits for the 10-shot setting. On the PASCAL-VOC dataset, we outperform existing SoTA approaches by up to 4.9 mAP points.
【5】 Self-supervised GAN Detector 标题:自监督GaN探测器 链接:https://arxiv.org/abs/2111.06575
作者:Yonghyun Jeong,Doyeon Kim,Pyounggeon Kim,Youngmin Ro,Jongwon Choi 机构:Samsung SDS,Chung-Ang University 摘要:尽管生成模式的最新发展为社会带来了各种各样的好处,但它也可能被恶意目的滥用,如欺诈、诽谤和假新闻。为了防止这种情况发生,进行了大量的研究,以区分生成的图像与真实图像,但在训练环境之外区分未看到的生成图像仍然存在挑战。这种限制是由于模型对特定GAN生成的训练数据的过度拟合问题而产生的数据依赖性造成的。为了克服这个问题,我们采用了一种自监督方案来提出一个新的框架。我们提出的方法由人工指纹发生器和GAN探测器组成,人工指纹发生器重建GAN图像的高质量人工指纹进行详细分析,GAN探测器通过学习重建的人工指纹识别GAN图像。为了提高人工指纹发生器的通用性,我们构建了具有不同上卷积层数的多个自动编码器。通过大量的烧蚀研究,我们的方法的鲁棒泛化性能优于以前最先进的算法的泛化性能,即使不使用训练数据集的GAN图像。 摘要:Although the recent advancement in generative models brings diverse advantages to society, it can also be abused with malicious purposes, such as fraud, defamation, and fake news. To prevent such cases, vigorous research is conducted to distinguish the generated images from the real images, but challenges still remain to distinguish the unseen generated images outside of the training settings. Such limitations occur due to data dependency arising from the model's overfitting issue to the training data generated by specific GANs. To overcome this issue, we adopt a self-supervised scheme to propose a novel framework. Our proposed method is composed of the artificial fingerprint generator reconstructing the high-quality artificial fingerprints of GAN images for detailed analysis, and the GAN detector distinguishing GAN images by learning the reconstructed artificial fingerprints. To improve the generalization of the artificial fingerprint generator, we build multiple autoencoders with different numbers of upconvolution layers. With numerous ablation studies, the robust generalization of our method is validated by outperforming the generalization of the previous state-of-the-art algorithms, even without utilizing the GAN images of the training dataset.
分类|识别相关(3篇)
【1】 Improving Structured Text Recognition with Regular Expression Biasing 标题:利用正则表达式偏向改进结构化文本识别 链接:https://arxiv.org/abs/2111.06738
作者:Baoguang Shi,Wenfeng Cheng,Yijuan Lu,Cha Zhang,Dinei Florencio 机构:Microsoft, License No., Expiration date, Sex 摘要:我们研究了结构化文本的识别问题,即遵循特定格式的文本,并建议通过指定正则表达式(regex)进行偏置来提高结构化文本的识别精度。有偏差的识别器识别与指定正则表达式匹配的文本,其准确度显著提高,但对其他文本的影响通常较小。通过将正则表达式建模为加权有限状态传感器(WFST)并通过动态替换将其注入解码器,实现了偏置。单个超参数控制偏置强度。该方法对于识别具有已知格式的文本行或包含来自领域词汇表的单词非常有用。示例包括驾照号码、处方中的药物名称等。我们证明了正则表达式偏差对打印和手写结构化文本数据集的有效性,并测量了其副作用。 摘要:We study the problem of recognizing structured text, i.e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing. A biased recognizer recognizes text that matches the specified regexes with significantly improved accuracy, at the cost of a generally small degradation on other text. The biasing is realized by modeling regexes as a Weighted Finite-State Transducer (WFST) and injecting it into the decoder via dynamic replacement. A single hyperparameter controls the biasing strength. The method is useful for recognizing text lines with known formats or containing words from a domain vocabulary. Examples include driver license numbers, drug names in prescriptions, etc. We demonstrate the efficacy of regex biasing on datasets of printed and handwritten structured text and measures its side effects.
【2】 Multiple Hypothesis Hypergraph Tracking for Posture Identification in Embryonic Caenorhabditis elegans 标题:多假设超图跟踪在秀丽线虫胚胎体位识别中的应用 链接:https://arxiv.org/abs/2111.06425
作者:Andrew Lauziere,Evan Ardiel,Stephen Xu,Hari Shroff 机构:Department of Mathematics, University of Maryland, College Park, College Park, MD , Department of Molecular Biology at MGH, Harvard Medical School, Boston, MA , Laboratory of High Resolution Optical Imaging, National Institutes of Health, Bethesda, MD 摘要:当前的多目标跟踪(MOT)方法依赖于经历可预测运动的独立目标轨迹来有效跟踪大量目标。不稳定的物体运动和不完善的检测等对抗性条件造成了一种具有挑战性的跟踪环境,在这种环境中,已建立的方法可能产生不充分的结果。多假设超图跟踪(MHHT)是为了在噪声检测中对相互依赖的目标进行MOT而发展起来的。该方法通过超图扩展了传统的多假设跟踪(MHT)方法,对相关目标运动进行建模,从而在具有挑战性的场景中实现鲁棒跟踪。MHHT应用于秀丽隐杆线虫胚胎发育后期的接缝细胞追踪。 摘要:Current methods in multiple object tracking (MOT) rely on independent object trajectories undergoing predictable motion to effectively track large numbers of objects. Adversarial conditions such as volatile object motion and imperfect detections create a challenging tracking landscape in which established methods may yield inadequate results. Multiple hypothesis hypergraph tracking (MHHT) is developed to perform MOT among interdependent objects amid noisy detections. The method extends traditional multiple hypothesis tracking (MHT) via hypergraphs to model correlated object motion, allowing for robust tracking in challenging scenarios. MHHT is applied to perform seam cell tracking during late-stage embryogenesis in embryonic C. elegans.
【3】 Selective Synthetic Augmentation with HistoGAN for Improved Histopathology Image Classification 标题:基于组织GAN的选择性综合增强改进组织病理学图像分类 链接:https://arxiv.org/abs/2111.06399
作者:Yuan Xue,Jiarong Ye,Qianying Zhou,Rodney Long,Sameer Antani,Zhiyun Xue,Carl Cornwell,Richard Zaino,Keith Cheng,Xiaolei Huang 机构:College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA , USA, Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD , USA 备注:None 摘要:组织病理学分析是目前诊断癌前病变的金标准。从数字图像进行自动组织病理学分类的目标需要监督训练,这需要大量专家注释,收集这些注释既昂贵又耗时。同时,从整张幻灯片图像中裁剪出的图像块的准确分类对于基于标准滑动窗口的组织病理学幻灯片分类方法至关重要。为了缓解这些问题,我们提出了一个精心设计的条件GAN模型,即HistoGAN,用于合成以类别标签为条件的真实组织病理学图像块。我们还研究了一种新的合成增强框架,该框架选择性地添加由我们提出的HistoGAN生成的新合成图像块,而不是直接用合成图像扩展训练集。通过基于指定标签的可信度和与真实标签图像的特征相似性选择合成图像,我们的框架为合成增强提供了质量保证。我们的模型在两个数据集上进行评估:一个是注释有限的宫颈组织病理学图像数据集,另一个是转移癌淋巴结组织病理学图像数据集。在此,我们表明,利用HistoGAN生成的图像进行选择性增强,可显著提高宫颈组织病理学和转移癌数据集的分类性能(准确率分别提高6.7%和2.8%)。 摘要:Histopathological analysis is the present gold standard for precancerous lesion diagnosis. The goal of automated histopathological classification from digital images requires supervised training, which requires a large number of expert annotations that can be expensive and time-consuming to collect. Meanwhile, accurate classification of image patches cropped from whole-slide images is essential for standard sliding window based histopathology slide classification methods. To mitigate these issues, we propose a carefully designed conditional GAN model, namely HistoGAN, for synthesizing realistic histopathology image patches conditioned on class labels. We also investigate a novel synthetic augmentation framework that selectively adds new synthetic image patches generated by our proposed HistoGAN, rather than expanding directly the training set with synthetic images. By selecting synthetic images based on the confidence of their assigned labels and their feature similarity to real labeled images, our framework provides quality assurance to synthetic augmentation. Our models are evaluated on two datasets: a cervical histopathology image dataset with limited annotations, and another dataset of lymph node histopathology images with metastatic cancer. Here, we show that leveraging HistoGAN generated images with selective augmentation results in significant and consistent improvements of classification performance (6.7% and 2.8% higher accuracy, respectively) for cervical histopathology and metastatic cancer datasets.
Zero/Few Shot|迁移|域适配|自适应(2篇)
【1】 Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data 标题:欺骗性D:有限数据条件下GaN训练的自适应伪增强算法 链接:https://arxiv.org/abs/2111.06849
作者:Liming Jiang,Bo Dai,Wayne Wu,Chen Change Loy 机构:S-Lab, Nanyang Technological University, SenseTime Research 备注:NeurIPS 2021. Code: this https URL Project page: this https URL 摘要:生成性对抗网络(GAN)通常需要大量数据进行训练,以便合成高保真图像。最近的研究表明,由于鉴别器过度拟合(阻碍生成器收敛的根本原因),数据有限的训练GANs仍然很难实现。本文介绍了一种称为自适应伪增强(APA)的新策略,以鼓励发生器和鉴别器之间的健康竞争。作为依赖于标准数据增强或模型正则化的现有方法的替代方法,APA通过使用生成器本身用生成的图像增强真实数据分布来缓解过度拟合,从而自适应地欺骗鉴别器。大量实验证明了APA在低数据区提高合成质量的有效性。我们提供了一个理论分析来检验我们新训练策略的收敛性和合理性。APA方法简单有效。它可以无缝地添加到功能强大的当代GaN中,如StyleGAN2,计算成本可以忽略不计。 摘要:Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime. We provide a theoretical analysis to examine the convergence and rationality of our new training strategy. APA is simple and effective. It can be added seamlessly to powerful contemporary GANs, such as StyleGAN2, with negligible computational cost.
【2】 Neuromuscular Control of the Face-Head-Neck Biomechanical Complex With Learning-Based Expression Transfer From Images and Videos 标题:基于学习的图像和视频表情转移对面-头-颈生物力学复合体的神经肌肉控制 链接:https://arxiv.org/abs/2111.06517
作者:Xiao S. Zeng,Surya Dwarakanath,Wuyue Lu,Masaki Nakada,Demetri Terzopoulos 机构:University of California 备注:12 pages, 7 figures, 2 tables 摘要:人脸表情从人到三维人脸模型的转换是一个经典的计算机图形学问题。在本文中,我们提出了一种新的、基于学习的方法,将面部表情和头部运动从图像和视频传输到面部-头颈部复合体的生物力学模型。利用面部动作编码系统(FACS)作为表情空间的中间表示,我们训练了一个深度神经网络来接收FACS动作单元(AUs),并为肌肉骨骼模型输出合适的面部肌肉和颌骨激活信号。通过生物力学模拟,激活使面部软组织变形,从而将表情传递给模型。我们的方法比以前的方法有优势。首先,面部表情在解剖学上是一致的,因为我们的生物力学模型模拟了面部、头部和颈部的相关解剖结构。其次,通过使用从生物力学模型本身生成的数据训练神经网络,我们消除了手动收集数据以进行表达传输的工作。通过将一系列面部图像和视频中的面部表情和头部姿势转移到我们的脸-头-颈模型上的实验证明了我们方法的成功。 摘要:The transfer of facial expressions from people to 3D face models is a classic computer graphics problem. In this paper, we present a novel, learning-based approach to transferring facial expressions and head movements from images and videos to a biomechanical model of the face-head-neck complex. Leveraging the Facial Action Coding System (FACS) as an intermediate representation of the expression space, we train a deep neural network to take in FACS Action Units (AUs) and output suitable facial muscle and jaw activation signals for the musculoskeletal model. Through biomechanical simulation, the activations deform the facial soft tissues, thereby transferring the expression to the model. Our approach has advantages over previous approaches. First, the facial expressions are anatomically consistent as our biomechanical model emulates the relevant anatomy of the face, head, and neck. Second, by training the neural network using data generated from the biomechanical model itself, we eliminate the manual effort of data collection for expression transfer. The success of our approach is demonstrated through experiments involving the transfer onto our face-head-neck model of facial expressions and head poses from a range of facial images and videos.
时序|行为识别|姿态|视频|运动估计(2篇)
【1】 Robust Analytics for Video-Based Gait Biometrics 标题:基于视频的步态生物特征稳健分析 链接:https://arxiv.org/abs/2111.06670
作者:Ebenezer R. H. P. Isaac 备注:Ph.D. Thesis, Anna University, Chennai, Feb. 2018 摘要:步态分析是对评估和量化动物运动的系统方法的研究。步态在许多最先进的生物识别系统中具有独特的重要性,因为它不需要受试者在其他模式要求的范围内进行合作。因此,从本质上讲,它是一种不引人注目的生物特征。本文讨论了步态的硬生物特征和软生物特征。它展示了如何通过基于姿势的投票方案仅基于步态来识别性别。然后介绍了利用遗传模板分割提高步态识别精度的方法。可以使用多人签名映射对广泛人群中的成员进行身份验证。最后,使用贝叶斯阈值可以在较小的群体中改进映射。本文提出的所有方法都优于现有的技术水平,并有足够的实验和结果。 摘要:Gait analysis is the study of the systematic methods that assess and quantify animal locomotion. Gait finds a unique importance among the many state-of-the-art biometric systems since it does not require the subject's cooperation to the extent required by other modalities. Hence by nature, it is an unobtrusive biometric. This thesis discusses both hard and soft biometric characteristics of gait. It shows how to identify gender based on gait alone through the Posed-Based Voting scheme. It then describes improving gait recognition accuracy using Genetic Template Segmentation. Members of a wide population can be authenticated using Multiperson Signature Mapping. Finally, the mapping can be improved in a smaller population using Bayesian Thresholding. All methods proposed in this thesis have outperformed their existing state of the art with adequate experimentation and results.
【2】 Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation 标题:高效3D手势估计的动态迭代精化算法 链接:https://arxiv.org/abs/2111.06500
作者:John Yang,Yash Bhalgat,Simyung Chang,Fatih Porikli,Nojun Kwak 机构:Seoul National University, Seoul, Korea, Qualcomm AI Research, Qualcomm Technologies, Inc., San Diego, CA, US‡, Qualcomm AI Research, Qualcomm Korea YH, Seoul, Korea 摘要:虽然手姿势估计是大多数交互式扩展现实和手势识别系统的关键组成部分,但目前的方法并没有针对计算和记忆效率进行优化。在本文中,我们提出了一个微小的深层神经网络,其中的部分层被递归地利用,以改进其先前的估计。在迭代优化过程中,我们使用学习的选通标准来决定是否退出权重共享循环,从而允许在我们的模型中进行每个样本的自适应。我们的网络经过训练,能够意识到其当前预测中的不确定性,从而在每次迭代时有效地进行选通,在每次循环后为其关键点估计估计方差。此外,我们还研究了端到端和渐进式训练协议对递归结构最大化模型容量的有效性。根据我们提出的2D/S方法,该方法的精度和效率均优于现有的2D/S方法。 摘要:While hand pose estimation is a critical component of most interactive extended reality and gesture recognition systems, contemporary approaches are not optimized for computational and memory efficiency. In this paper, we propose a tiny deep neural network of which partial layers are recursively exploited for refining its previous estimations. During its iterative refinements, we employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our network is trained to be aware of the uncertainty in its current predictions to efficiently gate at each iteration, estimating variances after each loop for its keypoint estimates. Additionally, we investigate the effectiveness of end-to-end and progressive training protocols for our recursive structure on maximizing the model capacity. With the proposed setting, our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
医学相关(3篇)
【1】 Stacked U-Nets with Self-Assisted Priors Towards Robust Correction of Rigid Motion Artifact in Brain MRI 标题:基于自辅助先验的堆叠U网在脑MRI刚性运动伪影鲁棒校正中的应用 链接:https://arxiv.org/abs/2111.06401
作者:Mohammed A. Al-masni,Seul Lee,Jaeuk Yi,Sewook Kim,Sung-Min Gho,Young Hun Choi,Dong-Hyun Kim 机构:Department of Electrical and Electronic Engineering, College of Engineering, Yonsei University, Seoul, Republic of Korea, GE Healthcare, Seoul, Republic of Korea, Department of Radiology, Seoul National University Hospital, Seoul, Republic of Korea. 备注:24 pages, 10 figures, 3 tables 摘要:在本文中,我们开发了一种有效的回顾性深度学习方法,称为具有自辅助先验的堆叠U网络,以解决MRI中的刚性运动伪影问题。建议的工作利用了来自损坏图像本身的额外知识先验,而不需要额外的对比度数据。该网络通过共享同一畸变物体相邻切片的辅助信息来学习缺失的结构细节。我们进一步设计了一种改进的堆叠U形网,有助于保留图像的空间细节,从而改善像素之间的依赖性。为了进行网络训练,模拟MRI运动伪影是不可避免的。我们使用各种类型的图像先验进行了深入的分析:提出的自我辅助先验和来自同一主题的其他图像对比度的先验。实验分析证明了我们的自我辅助先验的有效性和可行性,因为它不需要任何进一步的数据扫描。 摘要:In this paper, we develop an efficient retrospective deep learning method called stacked U-Nets with self-assisted priors to address the problem of rigid motion artifacts in MRI. The proposed work exploits the usage of additional knowledge priors from the corrupted images themselves without the need for additional contrast data. The proposed network learns missed structural details through sharing auxiliary information from the contiguous slices of the same distorted subject. We further design a refinement stacked U-Nets that facilitates preserving of the image spatial details and hence improves the pixel-to-pixel dependency. To perform network training, simulation of MRI motion artifacts is inevitable. We present an intensive analysis using various types of image priors: the proposed self-assisted priors and priors from other image contrast of the same subject. The experimental analysis proves the effectiveness and feasibility of our self-assisted priors since it does not require any further data scans.
【2】 Fast T2w/FLAIR MRI Acquisition by Optimal Sampling of Information Complementary to Pre-acquired T1w MRI 标题:T2w/FLAIR MRI快速采集与预采集T1w MRI互补信息的最优采样 链接:https://arxiv.org/abs/2111.06400
作者:Junwei Yang,Xiao-Xin Li,Feihong Liu,Dong Nie,Pietro Lio,Haikun Qi,Dinggang Shen 机构:Shang-haiTech University, and also with the Schoolof Information Science and Technology, Northwest University 摘要:最近关于T1辅助MRI重建其他模式的欠采样图像的研究表明,有可能进一步加速其他模式的MRI采集。大多数最先进的方法都是通过开发固定欠采样模式的网络架构来实现改进的,而没有充分利用模式之间的互补信息。虽然可以简单地修改现有的欠采样模式学习算法,以允许完全采样的T1加权MR图像辅助模式学习,但重建任务没有显著改善。为此,我们提出了一个迭代框架,用于优化另一种模式的欠采样模式,该模式可以在不同欠采样因子下补充完全采样的T1加权MR图像,同时联合优化T1辅助MRI重建模型。具体而言,我们提出的方法利用两种模式之间潜在信息的差异来确定采样模式,从而最大限度地提高T1加权MR图像在改善MRI重建方面的辅助能力。与常用的欠采样模式和最先进的方法相比,我们在公共数据集上展示了我们所学的欠采样模式的优越性能,这些方法可以联合优化重建网络和欠采样模式,欠采样因子高达8倍。 摘要:Recent studies on T1-assisted MRI reconstruction for under-sampled images of other modalities have demonstrated the potential of further accelerating MRI acquisition of other modalities. Most of the state-of-the-art approaches have achieved improvement through the development of network architectures for fixed under-sampling patterns, without fully exploiting the complementary information between modalities. Although existing under-sampling pattern learning algorithms can be simply modified to allow the fully-sampled T1-weighted MR image to assist the pattern learning, no significant improvement on the reconstruction task can be achieved. To this end, we propose an iterative framework to optimize the under-sampling pattern for MRI acquisition of another modality that can complement the fully-sampled T1-weighted MR image at different under-sampling factors, while jointly optimizing the T1-assisted MRI reconstruction model. Specifically, our proposed method exploits the difference of latent information between the two modalities for determining the sampling patterns that can maximize the assistance power of T1-weighted MR image in improving the MRI reconstruction. We have demonstrated superior performance of our learned under-sampling patterns on a public dataset, compared to commonly used under-sampling patterns and state-of-the-art methods that can jointly optimize both the reconstruction network and the under-sampling pattern, up to 8-fold under-sampling factor.
【3】 A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis 标题:一种组织病理学图像合成的多属性可控生成模型 链接:https://arxiv.org/abs/2111.06398
作者:Jiarong Ye,Yuan Xue,Peter Liu,Richard Zaino,Keith Cheng,Xiaolei Huang 机构:College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, USA, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA 备注:MICCAI 2021 摘要:生成模型已应用于医学成像领域的各种图像识别和合成任务。然而,一个更加可控和可解释的图像合成模型仍然缺乏,但对于诸如协助医疗训练等重要应用是必要的。在这项工作中,我们利用高效的自我注意和对比学习模块,并基于最先进的生成性对抗网络(GANs)来实现一种属性感知图像合成模型,称为AttributeGAN,它可以基于多属性输入生成高质量的组织病理学图像。与现有的单属性条件生成模型相比,我们提出的模型更好地反映了输入属性,并实现了属性值之间更平滑的插值。我们在包含尿路上皮癌染色H&E图像的组织病理学数据集上进行实验,并通过与最先进的模型以及模型的不同变体进行全面的定量和定性比较,证明我们提出的模型的有效性。代码可在https://github.com/karenyyy/MICCAI2021AttributeGAN. 摘要:Generative models have been applied in the medical imaging domain for various image recognition and synthesis tasks. However, a more controllable and interpretable image synthesis model is still lacking yet necessary for important applications such as assisting in medical training. In this work, we leverage the efficient self-attention and contrastive learning modules and build upon state-of-the-art generative adversarial networks (GANs) to achieve an attribute-aware image synthesis model, termed AttributeGAN, which can generate high-quality histopathology images based on multi-attribute inputs. In comparison to existing single-attribute conditional generative models, our proposed model better reflects input attributes and enables smoother interpolation among attribute values. We conduct experiments on a histopathology dataset containing stained H&E images of urothelial carcinoma and demonstrate the effectiveness of our proposed model via comprehensive quantitative and qualitative comparisons with state-of-the-art models as well as different variants of our model. Code is available at https://github.com/karenyyy/MICCAI2021AttributeGAN.
自动驾驶|车辆|车道检测等(1篇)
【1】 Expert Human-Level Driving in Gran Turismo Sport Using Deep Reinforcement Learning with Image-based Representation 标题:基于图像表示的深度强化学习在Gran Turismo运动中的专家人级驾驶 链接:https://arxiv.org/abs/2111.06449
作者:Ryuji Imamura,Takuma Seno,Kenta Kawamoto,Michael Spranger 机构:Tokyo, Sony AI Inc. 备注:Accepted at Deep Reinforcement Learning Workshop at Neural Information Processing Systems 2021 摘要:当人类玩虚拟赛车游戏时,他们使用游戏屏幕上的视觉环境信息来理解环境中的规则。相比之下,一款性能优于人类玩家的最先进的真实赛车游戏AI代理并不使用基于图像的环境信息,而是使用环境提供的紧凑而精确的测量。在本文中,提出了一种基于视觉的控制算法,并使用被称为高保真真实赛车模拟器的Gran Turismo Sport(GTS)在真实赛车场景中与人类运动员在相同条件下的性能进行了比较。在所提出的方法中,用从游戏屏幕图像中提取的特征表示代替传统最先进方法中构成观察的一部分的环境信息。我们证明了所提出的方法在高速驾驶场景下,即使以游戏屏幕图像作为高维输入,也能实现专家级的人-车控制。此外,它在计时任务中的表现优于GTS中的内置AI,其得分使其跻身前10%的约28000名人类玩家之列。 摘要:When humans play virtual racing games, they use visual environmental information on the game screen to understand the rules within the environments. In contrast, a state-of-the-art realistic racing game AI agent that outperforms human players does not use image-based environmental information but the compact and precise measurements provided by the environment. In this paper, a vision-based control algorithm is proposed and compared with human player performances under the same conditions in realistic racing scenarios using Gran Turismo Sport (GTS), which is known as a high-fidelity realistic racing simulator. In the proposed method, the environmental information that constitutes part of the observations in conventional state-of-the-art methods is replaced with feature representations extracted from game screen images. We demonstrate that the proposed method performs expert human-level vehicle control under high-speed driving scenarios even with game screen images as high-dimensional inputs. Additionally, it outperforms the built-in AI in GTS in a time trial task, and its score places it among the top 10% approximately 28,000 human players.
人脸|人群计数(2篇)
【1】 Diversity-Promoting Human Motion Interpolation via Conditional Variational Auto-Encoder 标题:基于条件变分自动编码器的促进分集的人体运动插值 链接:https://arxiv.org/abs/2111.06762
作者:Chunzhi Gu,Shuofeng Zhao,Chao Zhang 机构:University of Fukui, Fukui, Japan, Wenzhou Medical University, Wenzhou, China 摘要:在本文中,我们提出了一种基于深度生成模型的方法来生成不同的人体运动插值结果。我们借助于条件变分自动编码器(CVAE),通过利用编码器和解码器的递归神经网络(RNN)结构,学习以一对给定的开始和结束运动为条件的人体运动。此外,我们引入正则化损失来进一步提高样本多样性。经过训练后,我们的方法能够通过从学习的潜在空间重复采样来生成多个看似合理的相干运动。在公开数据集上的实验证明了我们的方法在样本合理性和多样性方面的有效性。 摘要:In this paper, we present a deep generative model based method to generate diverse human motion interpolation results. We resort to the Conditional Variational Auto-Encoder (CVAE) to learn human motion conditioned on a pair of given start and end motions, by leveraging the Recurrent Neural Network (RNN) structure for both the encoder and the decoder. Additionally, we introduce a regularization loss to further promote sample diversity. Once trained, our method is able to generate multiple plausible coherent motions by repetitively sampling from the learned latent space. Experiments on the publicly available dataset demonstrate the effectiveness of our method, in terms of sample plausibility and diversity.
【2】 Meta-Teacher For Face Anti-Spoofing 标题:面向人脸反欺骗的元教师 链接:https://arxiv.org/abs/2111.06638
作者:Yunxiao Qin,Zitong Yu,Longbin Yan,Zezheng Wang,Chenxu Zhao,Zhen Lei 备注:Accepted by IEEE TPAMI-2021 摘要:人脸防欺骗(FAS)可确保人脸识别免受演示文稿攻击(PAs)。现有的FAS方法通常使用手工制作的二进制或像素级标签监控PA探测器。然而,手工制作的标签可能不是监督PA探测器学习足够和内在欺骗线索的最适当方式。我们提出了一种新的元教师FAS(MT-FAS)方法来代替手工制作的标签,以训练元教师更有效地监督PA检测器。元教师以双层优化方式接受训练,学习监督PA检测器学习丰富欺骗线索的能力。双层优化包含两个关键部分:1)较低级别的训练,其中元教师在训练集中监督检测器的学习过程;2)更高层次的训练,通过最小化检测器的验证损失来优化元教师的教学绩效。我们的元教师与现有的师生模式有很大的不同,因为元教师被明确地训练为更好地教授检测器(学生),而现有教师被训练为具有卓越的准确性,忽略了教学能力。在五个FAS基准上进行的大量实验表明,与手工制作的标签和现有的师生模型相比,经过训练的元教师(1)可以提供更适合的监督;2)显著提高了PA探测器的性能。 摘要:Face anti-spoofing (FAS) secures face recognition from presentation attacks (PAs). Existing FAS methods usually supervise PA detectors with handcrafted binary or pixel-wise labels. However, handcrafted labels may are not the most adequate way to supervise PA detectors learning sufficient and intrinsic spoofing cues. Instead of using the handcrafted labels, we propose a novel Meta-Teacher FAS (MT-FAS) method to train a meta-teacher for supervising PA detectors more effectively. The meta-teacher is trained in a bi-level optimization manner to learn the ability to supervise the PA detectors learning rich spoofing cues. The bi-level optimization contains two key components: 1) a lower-level training in which the meta-teacher supervises the detector's learning process on the training set; and 2) a higher-level training in which the meta-teacher's teaching performance is optimized by minimizing the detector's validation loss. Our meta-teacher differs significantly from existing teacher-student models because the meta-teacher is explicitly trained for better teaching the detector (student), whereas existing teachers are trained for outstanding accuracy neglecting teaching ability. Extensive experiments on five FAS benchmarks show that with the proposed MT-FAS, the trained meta-teacher 1) provides better-suited supervision than both handcrafted labels and existing teacher-student models; and 2) significantly improves the performances of PA detectors.
超分辨率|去噪|去模糊|去雾(1篇)
【1】 Small or Far Away? Exploiting Deep Super-Resolution and Altitude Data for Aerial Animal Surveillance 标题:小的还是远的?利用深部超分辨率和高度数据进行航空动物监视 链接:https://arxiv.org/abs/2111.06830
作者:Mowen Xue,Theo Greenslade,Majid Mirmehdi,Tilo Burghardt 机构:Dept of Computer Science, University of Bristol, Bristol, BS,UB, UK 备注:11 pages, 7 figures, 2 tables 摘要:高空飞行的无人机捕捉到的图像越来越多地用于评估全球生物多样性和动物种群动态。然而,尽管有超高分辨率摄像机,但具有挑战性的采集场景和空中图像中的微小动物描绘,迄今为止一直是成功应用计算机视觉探测器的限制因素。在本文中,我们首次将深部目标探测器与超分辨率技术和高度数据相结合来解决这个问题。特别是,我们表明,将基于整体注意网络的超分辨率方法和定制的高度数据利用网络集成到标准识别管道中可以显著提高现实环境中的检测效率。我们在两个公共大型空中捕获动物数据集SAVMAP和AED上评估了该系统。我们发现,对于这两个数据集,所提出的方法可以持续改善烧蚀基线和最先进的性能。此外,我们还对动物分辨率和检测性能之间的关系进行了系统分析。我们得出结论,超分辨率和高度知识利用技术可以显著提高不同环境下的基准,因此,在航空图像中检测微小分辨率的动物时,应常规使用。 摘要:Visuals captured by high-flying aerial drones are increasingly used to assess biodiversity and animal population dynamics around the globe. Yet, challenging acquisition scenarios and tiny animal depictions in airborne imagery, despite ultra-high resolution cameras, have so far been limiting factors for applying computer vision detectors successfully with high confidence. In this paper, we address the problem for the first time by combining deep object detectors with super-resolution techniques and altitude data. In particular, we show that the integration of a holistic attention network based super-resolution approach and a custom-built altitude data exploitation network into standard recognition pipelines can considerably increase the detection efficacy in real-world settings. We evaluate the system on two public, large aerial-capture animal datasets, SAVMAP and AED. We find that the proposed approach can consistently improve over ablated baselines and the state-of-the-art performance for both datasets. In addition, we provide a systematic analysis of the relationship between animal resolution and detection performance. We conclude that super-resolution and altitude knowledge exploitation techniques can significantly increase benchmarks across settings and, thus, should be used routinely when detecting minutely resolved animals in aerial imagery.
其他神经网络|深度学习|模型|建模(4篇)
【1】 Monte Carlo dropout increases model repeatability 标题:蒙特卡罗退学提高了模型的重复性 链接:https://arxiv.org/abs/2111.06754
作者:Andreanne Lemay,Katharina Hoebel,Christopher P. Bridge,Didem Egemen,Ana Cecilia Rodriguez,Mark Schiffman,John Peter Campbell,Jayashree Kalpathy-Cramer 机构:Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, USA, NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montreal, Canada, Mila, Quebec AI Institute, Canada 备注:Machine Learning for Health (ML4H) at NeurIPS 2021 - Extended Abstract 摘要:将人工智能集成到临床工作流程中需要可靠和健壮的模型。稳健性的主要特征之一是可重复性。在不评估模型重复性的情况下,对分类性能给予了很大的关注,从而导致开发出在实践中无法使用的模型。在这项工作中,我们评估了同一患者在同一次就诊期间获得的图像上四种模型类型的重复性。我们研究了二元、多类、有序和回归模型在三项医学图像分析任务中的性能:宫颈癌筛查、乳腺密度估计和早产儿视网膜病变分类。此外,我们还评估了测试时抽样蒙特卡罗辍学预测对分类性能和重复性的影响。利用蒙特卡罗预测显著提高了二元、多类和有序模型上所有任务的可重复性,导致95%一致性限值平均降低17%。 摘要:The integration of artificial intelligence into clinical workflows requires reliable and robust models. Among the main features of robustness is repeatability. Much attention is given to classification performance without assessing the model repeatability, leading to the development of models that turn out to be unusable in practice. In this work, we evaluate the repeatability of four model types on images from the same patient that were acquired during the same visit. We study the performance of binary, multi-class, ordinal, and regression models on three medical image analysis tasks: cervical cancer screening, breast density estimation, and retinopathy of prematurity classification. Moreover, we assess the impact of sampling Monte Carlo dropout predictions at test time on classification performance and repeatability. Leveraging Monte Carlo predictions significantly increased repeatability for all tasks on the binary, multi-class, and ordinal models leading to an average reduction of the 95% limits of agreement by 17% points.
【2】 Frequency learning for structured CNN filters with Gaussian fractional derivatives 标题:具有高斯分数导数的结构化CNN滤波器的频率学习 链接:https://arxiv.org/abs/2111.06660
作者:Nikhil Saldanha,Silvia L. Pintea,Jan C. van Gemert,Nergis Tomen 机构:Frequency learning for structured CNN, filters with Gaussian fractional, derivatives, Computer Vision Lab, Delft University of Technology, Delft, Netherlands 备注:Accepted at BMVC 2021 摘要:频率信息是区分纹理以及不同对象的基础。经典的CNN结构通过固定的滤波器尺寸限制了频率学习,并且缺乏明确控制的方法。在这里,我们构建了基于高斯导数的结构化感受野滤波器。然而,我们不是使用预先确定的导数阶,这通常会导致基函数的固定频率响应,而是学习这些。我们表明,通过学习基的顺序,我们可以准确地学习滤波器的频率,从而适应基本学习任务的最佳频率。我们研究了分数阶导数的数学公式,以适应训练期间的滤波器频率。与我们构建的标准CNN和高斯导数CNN过滤网络相比,我们的公式可以节省参数并提高数据效率。 摘要:Frequency information lies at the base of discriminating between textures, and therefore between different objects. Classical CNN architectures limit the frequency learning through fixed filter sizes, and lack a way of explicitly controlling it. Here, we build on the structured receptive field filters with Gaussian derivative basis. Yet, rather than using predetermined derivative orders, which typically result in fixed frequency responses for the basis functions, we learn these. We show that by learning the order of the basis we can accurately learn the frequency of the filters, and hence adapt to the optimal frequencies for the underlying learning task. We investigate the well-founded mathematical formulation of fractional derivatives to adapt the filter frequencies during training. Our formulation leads to parameter savings and data efficiency when compared to the standard CNNs and the Gaussian derivative CNN filter networks that we build upon.
【3】 Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash 标题:学习打破深度感知散列:用例NeuralHash 链接:https://arxiv.org/abs/2111.06628
作者:Lukas Struppek,Dominik Hintersdorf,Daniel Neider,Kristian Kersting 机构:Department of Computer Science, TU Darmstadt, Darmstadt, Germany, Max Planck Institute for Software Systems, Kaiserslautern, Germany, Centre for Cognitive Science, TU Darmstadt, and Hessian Center for AI (hessian.AI) 备注:22 pages, 15 figures, 5 tables 摘要:苹果公司最近公布了其深度感知哈希系统NeuralHash,该系统可以在文件上传到iCloud服务之前检测用户设备上的儿童性虐待材料(CSAM)。公众对保护用户隐私和系统可靠性的批评很快就出现了。在本文中,我们提出了第一个基于神经哈希的深度感知哈希综合实证分析。具体地说,我们证明了当前的深度感知哈希可能并不健壮。对手可以通过在图像中应用细微的更改来操纵哈希值,这些更改可能是由基于梯度的方法引起的,也可能只是通过执行标准图像转换,强制或防止哈希冲突。这样的攻击使得恶意行为者很容易利用检测系统:从隐藏滥用材料到诬陷无辜用户,一切皆有可能。此外,使用散列值,仍然可以对存储在用户设备上的数据进行推断。在我们看来,根据我们的结果,目前形式的深度感知散列通常不适合于健壮的客户端扫描,不应从隐私角度使用。 摘要:Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system's reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.
【4】 Deep-learning in the bioimaging wild: Handling ambiguous data with deepflash2 标题:生物成像领域的深度学习:用深度闪光灯处理歧义数据2 链接:https://arxiv.org/abs/2111.06693
作者:Matthias Griebel,Dennis Segebarth,Nikolai Stein,Nina Schukraft,Philip Tovote,Robert Blum,Christoph M. Flath 机构:Department of Business and Economics, University of W¨urzburg, Germany, Institute of Clinical Neurobiology, University Hospital W¨urzburg, Germany, Center for Mental Health, University Hospital W¨urzburg, Germany 摘要:我们提出了deepflash2,这是一个深度学习解决方案,通过多专家注释和集成质量保证,有助于客观可靠地分割模糊的生物图像。因此,deepflash2解决了在生物成像中深度学习模型的训练、评估和应用过程中出现的典型挑战。该工具嵌入在易于使用的图形用户界面中,在节约使用计算资源的情况下,为语义和实例分割提供了最佳的类内预测性能。 摘要:We present deepflash2, a deep learning solution that facilitates the objective and reliable segmentation of ambiguous bioimages through multi-expert annotations and integrated quality assurance. Thereby, deepflash2 addresses typical challenges that arise during training, evaluation, and application of deep learning models in bioimaging. The tool is embedded in an easy-to-use graphical user interface and offers best-in-class predictive performance for semantic and instance segmentation under economical usage of computational resources.
其他(6篇)
【1】 Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases 标题:基于度量一致地图集的时间一致曲面重建 链接:https://arxiv.org/abs/2111.06838
作者:Jan Bednarik,Noam Aigerman,Vladimir G. Kim,Siddhartha Chaudhuri,Shaifali Parashar,Mathieu Salzmann,Pascal Fua 机构:School of Computer and Communication Sciences 备注:21 pages. arXiv admin note: substantial text overlap with arXiv:2104.06950 摘要:我们提出了一种从时间演化点云序列无监督地重建时间一致的曲面序列的方法。它在帧之间产生密集且语义上有意义的对应关系。我们将重建的曲面表示为神经网络计算的图谱,这使我们能够在帧之间建立对应关系。使这些对应具有语义意义的关键是确保在对应点计算的度量张量尽可能相似。我们设计了一种优化策略,使我们的方法对噪声和全局运动具有鲁棒性,无需先验对应或预对准步骤。因此,我们的方法在几个具有挑战性的数据集上优于最先进的方法。该守则可于https://github.com/bednarikjan/temporally_coherent_surface_reconstruction. 摘要:We propose a method for unsupervised reconstruction of a temporally-consistent sequence of surfaces from a sequence of time-evolving point clouds. It yields dense and semantically meaningful correspondences between frames. We represent the reconstructed surfaces as atlases computed by a neural network, which enables us to establish correspondences between frames. The key to making these correspondences semantically meaningful is to guarantee that the metric tensors computed at corresponding points are as similar as possible. We have devised an optimization strategy that makes our method robust to noise and global motions, without a priori correspondences or pre-alignment steps. As a result, our approach outperforms state-of-the-art ones on several challenging datasets. The code is available at https://github.com/bednarikjan/temporally_coherent_surface_reconstruction.
【2】 NRC-GAMMA: Introducing a Novel Large Gas Meter Image Dataset 标题:NRC-GAMMA:引入一种新的大型煤气表图像数据集 链接:https://arxiv.org/abs/2111.06827
作者:Ashkan Ebadi,Patrick Paul,Sofia Auer,Stéphane Tremblay 机构:National Research Council Canada, Montreal, QC H,T ,B, Canada, National Research Council Canada, Ottawa, ON K,K ,E, Canada 备注:12 pages, 7 figures, 1 table 摘要:自动抄表技术尚未普及。天然气、电力或水累积仪表读数大多由操作员或业主在现场手动完成。在某些国家/地区,运营商通过与其他运营商进行离线检查和/或在发生冲突或投诉时使用照片作为证据,将照片作为阅读证明,以确认阅读。整个过程耗时、昂贵,而且容易出错。自动化可以优化和促进此类劳动密集型和容易出现人为错误的流程。随着人工智能和计算机视觉领域的最新进展,自动抄表系统比以往任何时候都更加可行。受人工智能领域最新进展的推动,受研究界开源开放获取计划的启发,我们引入了一个新的大型基准数据集,即真实气体流量计图像,名为NRC-GAMMA数据集。数据是在2020年1月20日上午00:05到晚上11:59之间从Itron 400A隔膜式燃气表收集的。我们采用了一种系统的方法来标记图像,验证标签,并确保注释的质量。该数据集包含整个煤气表的28883幅图像,以及左、右刻度盘显示的57766幅裁剪图像。我们希望NRC-GAMMA数据集有助于研究团体设计和实施准确、创新、智能和可再生的自动燃气表读数解决方案。 摘要:Automatic meter reading technology is not yet widespread. Gas, electricity, or water accumulation meters reading is mostly done manually on-site either by an operator or by the homeowner. In some countries, the operator takes a picture as reading proof to confirm the reading by checking offline with another operator and/or using it as evidence in case of conflicts or complaints. The whole process is time-consuming, expensive, and prone to errors. Automation can optimize and facilitate such labor-intensive and human error-prone processes. With the recent advances in the fields of artificial intelligence and computer vision, automatic meter reading systems are becoming more viable than ever. Motivated by the recent advances in the field of artificial intelligence and inspired by open-source open-access initiatives in the research community, we introduce a novel large benchmark dataset of real-life gas meter images, named the NRC-GAMMA dataset. The data were collected from an Itron 400A diaphragm gas meter on January 20, 2020, between 00:05 am and 11:59 pm. We employed a systematic approach to label the images, validate the labellings, and assure the quality of the annotations. The dataset contains 28,883 images of the entire gas meter along with 57,766 cropped images of the left and the right dial displays. We hope the NRC-GAMMA dataset helps the research community to design and implement accurate, innovative, intelligent, and reproducible automatic gas meter reading solutions.
【3】 Identifying On-road Scenarios Predictive of ADHD usingDriving Simulator Time Series Data 标题:利用驾驶模拟器时间序列数据识别ADHD的道路情景预测 链接:https://arxiv.org/abs/2111.06774
作者:David Grethlein,Aleksanteri Sladek,Santiago Ontañón 机构:Drexel University, Philadelphia, Pennsylvania, USA, University of Pennsylvania 摘要:在本文中,我们介绍了一种新的算法称为迭代节减少(ISR)自动识别子区间的时空时间序列是预测的目标分类任务。具体地说,利用从驾驶模拟器研究中收集的数据,我们确定沿模拟路线的哪些空间区域(称为“部分”)倾向于表现出能够预测注意力缺陷多动障碍(ADHD)存在的驾驶行为。识别这些路段非常重要,主要原因有两个:(1)通过过滤掉非预测性时间序列子区间来提高训练模型的预测精度;(2)深入了解道路场景(命名事件)从接受ADHD治疗的患者与未接受ADHD治疗的患者中引出明显不同的驾驶行为。我们的实验结果表明,与之前的工作相比,性能得到了改善(+10%的准确度),并且在模拟器中识别和编写道路事件的预测路段(通过转弯和弯道)之间具有良好的对齐。 摘要:In this paper we introduce a novel algorithm called Iterative Section Reduction (ISR) to automatically identify sub-intervals of spatiotemporal time series that are predictive of a target classification task. Specifically, using data collected from a driving simulator study, we identify which spatial regions (dubbed "sections") along the simulated routes tend to manifest driving behaviors that are predictive of the presence of Attention Deficit Hyperactivity Disorder (ADHD). Identifying these sections is important for two main reasons: (1) to improve predictive accuracy of the trained models by filtering out non-predictive time series sub-intervals, and (2) to gain insights into which on-road scenarios (dubbed events) elicit distinctly different driving behaviors from patients undergoing treatment for ADHD versus those that are not. Our experimental results show both improved performance over prior efforts (+10% accuracy) and good alignment between the predictive sections identified and scripted on-road events in the simulator (negotiating turns and curves).
【4】 A comprehensive study of clustering a class of 2D shapes 标题:一类二维形状聚类的综合研究 链接:https://arxiv.org/abs/2111.06662
作者:Agnieszka Kaliszewska,Monika Syga 摘要:本文讨论了与作为三维旋转物体横截面边界的二维轮廓的形状和大小有关的聚类问题。我们提出了一系列基于不同Procrustes分析(PA)和动态时间扭曲(DTW)距离的相似性度量。本研究的动机和主要应用来自考古学。进行的计算实验指的是考古陶器的聚类。 摘要:The paper concerns clustering with respect to the shape and size of 2D contours that are boundaries of cross-sections of 3D objects of revolution. We propose a number of similarity measures based on combined disparate Procrustes analysis (PA) and Dynamic Time Warping (DTW) distances. Motivation and the main application for this study comes from archaeology. The performed computational experiments refer to the clustering of archaeological pottery.
【5】 Fully Automatic Page Turning on Real Scores 标题:全自动页面打开真实分数 链接:https://arxiv.org/abs/2111.06643
作者:Florian Henkel,Stephanie Schwaiger,Gerhard Widmer 机构:Institute of Computational Perception, Johannes Kepler University, Linz, Austria, LIT Artificial Intelligence Lab, Linz Institute of Technology, Austria 备注:ISMIR 2021 Late Breaking/Demo 摘要:我们提出了一个自动翻页系统的原型,该系统直接处理真实分数,即纸张图像,无需任何符号表示。我们的系统基于一个多模态神经网络架构,它观察一个完整的图像页面作为输入,聆听传入的音乐表演,并预测图像中相应的位置。使用我们系统的位置估计,我们使用一种简单的启发式方法,一旦到达纸张图像中的某个位置,就触发翻页事件。作为概念证明,我们进一步将我们的系统与实际机器结合起来,实际机器将根据命令翻开新的一页。 摘要:We present a prototype of an automatic page turning system that works directly on real scores, i.e., sheet images, without any symbolic representation. Our system is based on a multi-modal neural network architecture that observes a complete sheet image page as input, listens to an incoming musical performance, and predicts the corresponding position in the image. Using the position estimation of our system, we use a simple heuristic to trigger a page turning event once a certain location within the sheet image is reached. As a proof of concept we further combine our system with an actual machine that will physically turn the page on command.
【6】 Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction 标题:用极小极大速率降低法将闭环数据转录到LDR 链接:https://arxiv.org/abs/2111.06636
作者:Xili Dai,Shengbang Tong,Mingyang Li,Ziyang Wu,Kwan Ho Ryan Chan,Pengyuan Zhai,Yaodong Yu,Michael Psenka,Xiaojun Yuan,Heung Yeung Shum,Yi Ma 机构:Heung-Yeung Shum, University of California, Berkeley, University of Electronic Science and Technology of China, Tsinghua-Berkeley Shenzhen Institute (TBSI), International Digital Economy Academy (IDEA), Johns Hopkins University, Harvard University 备注:37 pages 摘要:这项工作提出了一个新的计算框架,用于学习真实世界数据集的显式生成模型。特别地,我们建议在由多个独立的多维线性子空间组成的特征空间中学习多类多维数据分布和{线性判别表示(LDR)}之间的{\em闭环转录}。特别地,我们认为所寻求的最佳编码和解码映射可以表示为编码器和解码器之间的{\em二人极小极大博弈}的平衡点。这个游戏的一个自然效用函数是所谓的{\em rate reduction},这是一个简单的信息论度量,用于度量特征空间中类似高斯的子空间混合物之间的距离。我们的公式从控制系统的闭环误差反馈中得到启发,避免了昂贵的计算和最小化数据空间或特征空间中任意分布之间的近似距离。在很大程度上,这种新的表述统一了自动编码和GAN的概念和优点,并自然地将它们扩展到学习多类多维真实世界数据的{\em-区分和生成}表示的环境中。我们在许多基准图像数据集上进行的大量实验证明了这种新闭环公式的巨大潜力:在公平比较下,学习到的解码器的视觉质量和编码器的分类性能具有竞争力,并且通常优于基于GAN、VAE或两者结合的现有方法。我们注意到,不同类别的学习特征被显式映射到特征空间中的近似{\em独立主子空间};每个类中不同的视觉属性由每个子空间中的{\em独立主成分}建模。 摘要:This work proposes a new computational framework for learning an explicit generative model for real-world datasets. In particular we propose to learn {\em a closed-loop transcription} between a multi-class multi-dimensional data distribution and a { linear discriminative representation (LDR)} in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as the equilibrium point of a {\em two-player minimax game between the encoder and decoder}. A natural utility function for this game is the so-called {\em rate reduction}, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a {\em both discriminative and generative} representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and often better than existing methods based on GAN, VAE, or a combination of both. We notice that the so learned features of different classes are explicitly mapped onto approximately {\em independent principal subspaces} in the feature space; and diverse visual attributes within each class are modeled by the {\em independent principal components} within each subspace.