前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >人工智能学术速递[12.14]

人工智能学术速递[12.14]

作者头像
公众号-arXiv每日学术速递
发布于 2021-12-17 08:12:22
发布于 2021-12-17 08:12:22
1.5K0
举报

cs.AI人工智能,共计78篇

【1】 A Methodology for a Scalable, Collaborative, and Resource-Efficient Platform to Facilitate Healthcare AI Research 标题:一种促进医疗人工智能研究的可扩展、协作且资源高效的平台的方法论 链接:https://arxiv.org/abs/2112.06883

作者:Raphael Y. Cohen,Vesela P. Kovacheva 机构:Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women’s Hospital, Boston, MA , Harvard Medical School 摘要:医疗AI具有提高患者安全性、提高效率和改善患者预后的潜力,但研究往往受到数据访问、队列管理和分析工具的限制。电子健康记录数据、实时数据和实时高分辨率设备数据的收集和转换可能具有挑战性且耗时。开发真实世界的人工智能工具需要克服数据采集、医院资源稀缺和对数据治理的高需求等方面的挑战。这些瓶颈可能导致人工智能系统的研究和开发需要大量的资源和长期的延迟。我们提出了一个系统和方法来加速数据采集、数据集开发和分析以及AI模型开发。我们创建了一个交互式平台,该平台依赖于可扩展的微服务后端。该系统每小时可接收15000个患者记录,其中每个记录代表数千个多模式测量、文本注释和高分辨率数据。总的来说,这些记录可以接近1 TB的数据量。系统可在2-5分钟内进一步执行队列生成和初步数据集分析。因此,多个用户可以同时协作,实时迭代数据集和模型。我们预计,这种方法将推动现实世界的人工智能模型开发,并从长远来看,有意义地改善医疗服务。 摘要:Healthcare AI holds the potential to increase patient safety, augment efficiency and improve patient outcomes, yet research is often limited by data access, cohort curation, and tooling for analysis. Collection and translation of electronic health record data, live data, and real-time high resolution device data can be challenging and time-consuming. The development of real-world AI tools requires overcoming challenges in data acquisition, scarce hospital resources and high needs for data governance. These bottlenecks may result in resource-heavy needs and long delays in research and development of AI systems. We present a system and methodology to accelerate data acquisition, dataset development and analysis, and AI model development. We created an interactive platform that relies on a scalable microservice backend. This system can ingest 15,000 patient records per hour, where each record represents thousands of multimodal measurements, text notes, and high resolution data. Collectively, these records can approach a terabyte of data. The system can further perform cohort generation and preliminary dataset analysis in 2-5 minutes. As a result, multiple users can collaborate simultaneously to iterate on datasets and models in real time. We anticipate that this approach will drive real-world AI model development, and, in the long run, meaningfully improve healthcare delivery.

【2】 Frontiers in Collective Intelligence: A Workshop Report 标题:集体情报前沿:研讨会报告 链接:https://arxiv.org/abs/2112.06864

作者:Tyler Millhouse,Melanie Moses,Melanie Mitchell 机构:Santa Fe Institute, University of New Mexico 摘要:在2021八月,圣达菲研究所举办了一个集体智力研讨会,作为智力项目的基础。该项目旨在通过促进关于智能本质的跨学科研究来推进人工智能领域。研讨会汇集了计算机科学家、生物学家、哲学家、社会科学家和其他人,分享他们关于智能如何从多个代理之间的交互中产生的见解——这些代理是机器、动物还是人类。在本报告中,我们总结了每一次会谈和随后的讨论。我们还提出了一些关键主题,并确定了未来研究的重要前沿。 摘要:In August of 2021, the Santa Fe Institute hosted a workshop on collective intelligence as part of its Foundations of Intelligence project. This project seeks to advance the field of artificial intelligence by promoting interdisciplinary research on the nature of intelligence. The workshop brought together computer scientists, biologists, philosophers, social scientists, and others to share their insights about how intelligence can emerge from interactions among multiple agents--whether those agents be machines, animals, or human beings. In this report, we summarize each of the talks and the subsequent discussions. We also draw out a number of key themes and identify important frontiers for future research.

【3】 VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks 标题:VL-Adapter:视觉和语言任务的参数高效迁移学习 链接:https://arxiv.org/abs/2112.06825

作者:Yi-Lin Sung,Jaemin Cho,Mohit Bansal 机构:UNC Chapel Hill 备注:13 pages 摘要:最近,在大型文本语料库上预先训练的微调语言模型已经在视觉和语言(V&L)任务以及纯语言任务上提供了巨大的改进。然而,由于模型尺寸正在快速增长,因此微调预训练模型的整个参数集变得不切实际。因此,在本文中,我们将基于适配器的参数有效迁移学习技术引入V&L模型,如VL-BART和VL-T5。我们在一个统一的多任务设置中对四种不同的V&L任务(VQAv2、GQA、NLVR2和MSCOCO图像字幕)评估我们的方法。通过仔细的训练和彻底的实验,我们将三种流行的基于适配器的方法(适配器、Hyperformer、Compacter)与标准的完全微调和最近提出的快速调优方法进行了对比。我们还通过共享适配器的权重来跨任务获取知识,从而提高适配器的效率和性能。我们的结果表明,使用权重共享技术(占总参数的4.4%)训练适配器可以匹配微调整个模型的性能。最后,我们提出了一个全面的分析,包括适配器和任务特定提示的组合以及V&L预训练对适配器的影响。我们的代码可从以下网址获得:https://github.com/ylsung/VL_adapter. 摘要:Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks as well as on pure language tasks. However, fine-tuning the entire parameter set of pre-trained models becomes impractical since the model size is growing rapidly. Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VL-T5. We evaluate our methods in a unified multi-task setup on four diverse V&L tasks: VQAv2, GQA, NLVR2 , and MSCOCO image captioning. With careful training and thorough experiments, we benchmark three popular adapter-based methods (Adapter, Hyperformer, Compacter) against the standard full fine-tuning and the recently proposed prompt-tuning approach. We also enhance the efficiency and performance of adapters by sharing their weights to attain knowledge across tasks. Our results demonstrate that training the adapter with the weight-sharing technique (4.4% of total parameters) can match the performance of fine-tuning the entire model. Lastly, we present a comprehensive analysis including the combination of adapter and task-specific prompts and the impact of V&L pre-training on adapters. Our code is available at: https://github.com/ylsung/VL_adapter.

【4】 Explanation Container in Case-Based Biomedical Question-Answering 标题:基于案例的生物医学问答中的解释容器 链接:https://arxiv.org/abs/2112.06780

作者:Prateek Goel,Adam J. Johs,Manil Shrestha,Rosina O. Weber 机构:Dept. of Information Science, Drexel University, PHL , USA , Dept. of Computer Science, Drexel University, PHL , USA 摘要:国家转化科学促进中心(NCATS)生物医学数据翻译器(Translator)旨在缓解转化科学家面临的问题。Translator是一种多代理体系结构,由六个自主中继代理(ARA)和八个知识提供者(KPs)组成。在本文中,我们介绍了解释代理(xARA)的设计,这是一种基于案例的ARA,通过访问多个KPs、排名结果和解释结果排名来回答生物医学查询。解释代理设计有五个知识容器,其中包括四个原始知识容器和一个额外的解释容器——解释容器。解释容器是基于案例的,并使用自己的知识容器进行设计。 摘要:The National Center for Advancing Translational Sciences(NCATS) Biomedical Data Translator (Translator) aims to attenuate problems faced by translational scientists. Translator is a multi-agent architecture consisting of six autonomous relay agents (ARAs) and eight knowledge providers (KPs). In this paper, we present the design of the Explanatory Agent (xARA), a case-based ARA that answers biomedical queries by accessing multiple KPs, ranking results, and explaining the ranking of results. The Explanatory Agent is designed with five knowledge containers that include the four original knowledge containers and one additional container for explanation - the Explanation Container. The Explanation Container is case-based and designed with its own knowledge containers.

【5】 Value Function Factorisation with Hypergraph Convolution for Cooperative Multi-agent Reinforcement Learning 标题:基于超图卷积的值函数分解在协作式多智能体强化学习中的应用 链接:https://arxiv.org/abs/2112.06771

作者:Yunpeng Bai,Chen Gong,Bin Zhang,Guoliang Fan,Xinwen Hou 机构:Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences, School of Computing and Information Systems, Singapore Management University 备注:6 pages, 3 figures 摘要:近年来,多agent系统(MAS)中agent之间的协作已成为研究的热点,许多基于集中训练和分散执行(CTDE)的算法如VDN和QMIX也被提出。但是,这些方法忽略了隐藏在单个动作值中的信息。在本文中,我们提出了超图卷积混合(HGCN-MIX),一种结合超图卷积和值分解的方法。通过将动作值视为信号,HGCN-MIX旨在通过自学习超图探索这些信号之间的关系。实验结果表明,HGCN-MIX在各种情况下,尤其是在具有多个代理的情况下,与星际争霸II多代理挑战(SMAC)基准中的最新技术相匹配或超过。 摘要:Cooperation between agents in a multi-agent system (MAS) has become a hot topic in recent years, and many algorithms based on centralized training with decentralized execution (CTDE), such as VDN and QMIX, have been proposed. However, these methods disregard the information hidden in the individual action values. In this paper, we propose HyperGraph CoNvolution MIX (HGCN-MIX), a method that combines hypergraph convolution with value decomposition. By treating action values as signals, HGCN-MIX aims to explore the relationship between these signals via a self-learning hypergraph. Experimental results present that HGCN-MIX matches or surpasses state-of-the-art techniques in the StarCraft II multi-agent challenge (SMAC) benchmark on various situations, notably those with a number of agents.

【6】 Adaptation through prediction: multisensory active inference torque control 标题:预测自适应:多感官主动推理转矩控制 链接:https://arxiv.org/abs/2112.06752

作者:Cristian Meo,Giovanni Franzese,Corrado Pezzato,Max Spahn,Pablo Lanillos 机构: Department of Cognitive Robotics, Delft University of Technology, Donders Institute for Brain, Department ofArtificial Intelligence, Radboud University 备注:arXiv admin note: text overlap with arXiv:2103.04412 摘要:适应外部和内部变化是机器人系统在不确定环境中的主要任务。在这里,我们提出了一种新的多传感器主动推理扭矩控制器的工业武器,显示如何预测可以用来解决适应。我们的控制器受预测性大脑假设的启发,通过结合低维和高维传感器输入(如原始图像)的学习和多模式集成,改进了当前主动推理方法的能力,同时简化了体系结构。我们在7自由度Franka-Emika熊猫机器人手臂上对我们的模型进行了系统评估,将其行为与之前的主动推理基线和经典控制器进行了比较,定性和定量地分析了自适应能力和控制精度。结果表明,多模态滤波提高了目标定向到达的控制精度和高噪声抑制能力,并且能够适应动态惯性变化、弹性约束和人为干扰,无需重新学习模型或参数重新调整。 摘要:Adaptation to external and internal changes is major for robotic systems in uncertain environments. Here we present a novel multisensory active inference torque controller for industrial arms that shows how prediction can be used to resolve adaptation. Our controller, inspired by the predictive brain hypothesis, improves the capabilities of current active inference approaches by incorporating learning and multimodal integration of low and high-dimensional sensor inputs (e.g., raw images) while simplifying the architecture. We performed a systematic evaluation of our model on a 7DoF Franka Emika Panda robot arm by comparing its behavior with previous active inference baselines and classic controllers, analyzing both qualitatively and quantitatively adaptation capabilities and control accuracy. Results showed improved control accuracy in goal-directed reaching with high noise rejection due to multimodal filtering, and adaptability to dynamical inertial changes, elasticity constraints and human disturbances without the need to relearn the model nor parameter retuning.

【7】 Role of Human-AI Interaction in Selective Prediction 标题:人-人工智能交互作用在选择性预测中的作用 链接:https://arxiv.org/abs/2112.06751

作者:Elizabeth Bondi,Raphael Koster,Hannah Sheahan,Martin Chadwick,Yoram Bachrach,Taylan Cemgil,Ulrich Paquet,Krishnamurthy Dvijotham 机构:Google Brain (work done at DeepMind) 备注:To be published in AAAI 2022 摘要:最近的工作显示了选择性预测系统的潜在好处,当人工智能的预测不可靠时,选择性预测系统可以学会服从人类,特别是在医疗保健或保护等高风险应用中提高人工智能系统的可靠性。然而,大多数以前的工作假设,当人类作为人工智能团队的一部分而不是自己解决预测任务时,人类行为保持不变。我们通过在选择性预测的背景下进行量化人工智能交互的实验来证明情况并非如此。特别是,我们研究了向人类传达不同类型的信息对人工智能系统延迟决策的影响。通过使用真实世界的保护数据和选择性预测系统,与单独工作的人类或人工智能系统相比,该系统提高了预期的准确性,我们表明,这种信息传递对人类判断的准确性有重大影响。我们的结果研究了消息传递策略的两个组成部分:1)人类是否被告知AI系统的预测,2)他们是否被告知选择性预测系统推迟的决定。通过操纵这些消息传递组件,我们表明,通过通知人推迟决定,但不披露人工智能的预测,可以显著提高人的绩效。因此,我们认为,在设计选择性预测系统时考虑延迟决策是如何传达给人类的是至关重要的,并且人类人工智能团队的组合精度必须使用人在回路框架中仔细评估。 摘要:Recent work has shown the potential benefit of selective prediction systems that can learn to defer to a human when the predictions of the AI are unreliable, particularly to improve the reliability of AI systems in high-stakes applications like healthcare or conservation. However, most prior work assumes that human behavior remains unchanged when they solve a prediction task as part of a human-AI team as opposed to by themselves. We show that this is not the case by performing experiments to quantify human-AI interaction in the context of selective prediction. In particular, we study the impact of communicating different types of information to humans about the AI system's decision to defer. Using real-world conservation data and a selective prediction system that improves expected accuracy over that of the human or AI system working individually, we show that this messaging has a significant impact on the accuracy of human judgements. Our results study two components of the messaging strategy: 1) Whether humans are informed about the prediction of the AI system and 2) Whether they are informed about the decision of the selective prediction system to defer. By manipulating these messaging components, we show that it is possible to significantly boost human performance by informing the human of the decision to defer, but not revealing the prediction of the AI. We therefore show that it is vital to consider how the decision to defer is communicated to a human when designing selective prediction systems, and that the composite accuracy of a human-AI team must be carefully evaluated using a human-in-the-loop framework.

【8】 Probability Density Estimation Based Imitation Learning 标题:基于概率密度估计的模仿学习 链接:https://arxiv.org/abs/2112.06746

作者:Yang Liu,Yongzhe Chang,Shilei Jiang,Xueqian Wang,Bin Liang,Bo Yuan 机构:Shenzhen International Graduate School, Tsinghua University, Department of Automation, Tsinghua University 摘要:模仿学习(IL)是一种有效的学习范式,它利用了主体与环境之间的相互作用。它不需要明确的奖励信号,而是尝试使用专家演示恢复所需的策略。一般来说,IL方法可分为行为克隆(BC)和反向强化学习(IRL)。本文提出了一种新的基于概率密度估计的IRL奖励函数,可以显著降低现有IRL方法的复杂度。此外,我们证明了由我们的奖励函数导出的理论最优策略与专家策略是一致的,只要它是确定性的。因此,IRL问题可以优雅地转化为概率密度估计问题。基于所提出的奖励函数,我们提出了一个“观察-尝试-学习”式的框架,称为基于概率密度估计的模仿学习(PDEIL),它可以在离散和连续的动作空间中工作。最后,在健身房环境中进行的综合实验表明,PDEIL比现有算法更有效地恢复接近地面真相的奖励。 摘要:Imitation Learning (IL) is an effective learning paradigm exploiting the interactions between agents and environments. It does not require explicit reward signals and instead tries to recover desired policies using expert demonstrations. In general, IL methods can be categorized into Behavioral Cloning (BC) and Inverse Reinforcement Learning (IRL). In this work, a novel reward function based on probability density estimation is proposed for IRL, which can significantly reduce the complexity of existing IRL methods. Furthermore, we prove that the theoretically optimal policy derived from our reward function is identical to the expert policy as long as it is deterministic. Consequently, an IRL problem can be gracefully transformed into a probability density estimation problem. Based on the proposed reward function, we present a "watch-try-learn" style framework named Probability Density Estimation based Imitation Learning (PDEIL), which can work in both discrete and continuous action spaces. Finally, comprehensive experiments in the Gym environment show that PDEIL is much more efficient than existing algorithms in recovering rewards close to the ground truth.

【9】 Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding 标题:多轮端到端口语理解的细心语境传递 链接:https://arxiv.org/abs/2112.06743

作者:Kai Wei,Thanh Tran,Feng-Ju Chang,Kanthashree Mysore Sathyendra,Thejaswi Muniyappa,Jing Liu,Anirudh Raju,Ross McGowan,Nathan Susanj,Ariya Rastrow,Grant P. Strimel 机构:Alexa Speech, Amazon 备注:None 摘要:近年来,端到端(E2E)口语理解(SLU)系统取得了重大进展,该系统直接从口语音频预测意图和时隙。虽然对话历史已经被用来改进传统的基于文本的自然语言理解系统,但当前的E2E SLU方法还没有在多回合和面向任务的对话中纳入这些关键的上下文信号。在这项工作中,我们提出了一个上下文E2E SLU模型架构,该架构在多回合对话的编码的先前话语和对话行为(语音助手采取的行动)上使用了一个多头部注意机制。我们详细介绍了将这些上下文集成到最先进的递归和基于转换器的模型中的替代方法。将该方法应用于语音助手收集的大量未识别的话语数据集,平均单词错误率和语义错误率分别降低了10.8%和12.6%。我们还展示了一个公开的数据集上的结果,并表明我们的方法在非文本基线上显著提高了性能 摘要:Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio. While dialogue history has been exploited to improve conventional text-based natural language understanding systems, current E2E SLU approaches have not yet incorporated such critical contextual signals in multi-turn and task-oriented dialogues. In this work, we propose a contextual E2E SLU model architecture that uses a multi-head attention mechanism over encoded previous utterances and dialogue acts (actions taken by the voice assistant) of a multi-turn dialogue. We detail alternative methods to integrate these contexts into the state-ofthe-art recurrent and transformer-based models. When applied to a large de-identified dataset of utterances collected by a voice assistant, our method reduces average word and semantic error rates by 10.8% and 12.6%, respectively. We also present results on a publicly available dataset and show that our method significantly improves performance over a noncontextual baseline

【10】 Understanding and Improving the Exemplar-based Generation for Open-domain Conversation 标题:理解和改进基于样本的开放领域会话生成 链接:https://arxiv.org/abs/2112.06723

作者:Seungju Han,Beomsu Kim,Seokjun Seo,Enkhbayar Erdenee,Buru Chang 机构:Hyperconnect 摘要:基于范例的开放域会话生成模型利用生成模型和检索模型,基于检索器提供的范例生成响应。然而,他们在生成响应时通常忽略检索到的示例,或者生成与检索到的示例过匹配的响应。在本文中,我们认为这些缺点是源于开放域会话的一对多问题。当检索到的范例与给定的上下文相关但与gold响应显著不同时,基于范例的生成模型被训练为忽略范例,因为范例对生成gold响应没有帮助。另一方面,当检索到的范例在词汇上与gold响应相似时,生成模型被训练为高度依赖范例。因此,我们提出了一种训练方法,选择语义上与gold响应相关但词汇上与gold响应相距较远的样本,以缓解上述缺点。在训练阶段,我们提出的训练方法首先使用gold响应而不是对话上下文作为查询来选择语义上与gold响应相关的示例。然后,它消除了词汇上与gold响应相似的示例,以减轻生成模型对该示例的依赖。其余的示例可能与给定的上下文无关,因为它们是根据gold响应进行搜索的。因此,我们提出的训练方法进一步利用给定上下文和样本之间的相关性得分来惩罚不相关的样本。大量的实验表明,我们提出的训练方法缓解了现有基于范例的生成模型的缺点,并在适当性和信息性方面显著提高了性能。 摘要:Exemplar-based generative models for open-domain conversation produce responses based on the exemplars provided by the retriever, taking advantage of generative models and retrieval models. However, they often ignore the retrieved exemplars while generating responses or produce responses over-fitted to the retrieved exemplars. In this paper, we argue that these drawbacks are derived from the one-to-many problem of the open-domain conversation. When the retrieved exemplar is relevant to the given context yet significantly different from the gold response, the exemplar-based generative models are trained to ignore the exemplar since the exemplar is not helpful for generating the gold response. On the other hand, when the retrieved exemplar is lexically similar to the gold response, the generative models are trained to rely on the exemplar highly. Therefore, we propose a training method selecting exemplars that are semantically relevant to the gold response but lexically distanced from the gold response to mitigate the above disadvantages. In the training phase, our proposed training method first uses the gold response instead of dialogue context as a query to select exemplars that are semantically relevant to the gold response. And then, it eliminates the exemplars that lexically resemble the gold responses to alleviate the dependency of the generative models on that exemplars. The remaining exemplars could be irrelevant to the given context since they are searched depending on the gold response. Thus, our proposed training method further utilizes the relevance scores between the given context and the exemplars to penalize the irrelevant exemplars. Extensive experiments demonstrate that our proposed training method alleviates the drawbacks of the existing exemplar-based generative models and significantly improves the performance in terms of appropriateness and informativeness.

【11】 Learning Semantic-Aligned Feature Representation for Text-based Person Search 标题:用于基于文本的人物搜索的学习语义对齐特征表示 链接:https://arxiv.org/abs/2112.06714

作者:Shiping Li,Min Cao,Min Zhang 机构:School of Computer Science and Technology, Soochow University, China 备注:5 pages, 3 figures, 3 tables 摘要:基于文本的人员搜索旨在通过文本描述检索特定行人的图像。这项任务的关键挑战是消除模态间的差距,实现模态间的特征对齐。在本文中,我们提出了一种基于文本的人物搜索的语义对齐嵌入方法,该方法通过自动学习语义对齐的视觉特征和文本特征来实现跨模式的特征对齐。首先,我们引入两个基于转换器的主干来编码图像和文本的鲁棒特征表示。其次,我们设计了一个语义一致的特征聚合网络,自适应地选择具有相同语义的特征,并将其聚合为零件感知特征,该网络由一个受跨模态零件对齐损失和多样性损失约束的多头部注意模块实现。在中大PEDES和Flickr30K数据集上的实验结果表明,我们的方法达到了最先进的性能。 摘要:Text-based person search aims to retrieve images of a certain pedestrian by a textual description. The key challenge of this task is to eliminate the inter-modality gap and achieve the feature alignment across modalities. In this paper, we propose a semantic-aligned embedding method for text-based person search, in which the feature alignment across modalities is achieved by automatically learning the semantic-aligned visual features and textual features. First, we introduce two Transformer-based backbones to encode robust feature representations of the images and texts. Second, we design a semantic-aligned feature aggregation network to adaptively select and aggregate features with the same semantics into part-aware features, which is achieved by a multi-head attention module constrained by a cross-modality part alignment loss and a diversity loss. Experimental results on the CUHK-PEDES and Flickr30K datasets show that our method achieves state-of-the-art performances.

【12】 Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching 标题:利用差错控制的近似关键字缓存加速深度学习分类 链接:https://arxiv.org/abs/2112.06671

作者:Alessandro Finamore,James Roberts,Massimo Gallo,Dario Rossi 机构:HUAWEI Technologies, France 备注:Accepted at IEEE Infocom 2022 摘要:虽然深度学习(DL)技术是解决映射到分类任务的网络问题的一种很有前途的工具,但相对于实时流量测量要求而言,其计算复杂性仍然过高。为了降低DL推理的开销,我们提出了一种新的缓存模式,我们称之为近似密钥缓存,它根据缓存的DL推理结果返回查找选定输入的近似结果。虽然近似缓存命中减轻了DL推断工作负载并增加了系统吞吐量,但它们引入了近似错误。因此,我们将近似密钥缓存与一个错误纠正原则算法相结合,我们称之为自动刷新。我们对经典LRU和理想缓存的缓存系统性能进行了分析建模,对预期性能进行了跟踪驱动评估,并将我们提出的方法与最先进的相似性缓存进行了比较——证明了我们建议的实际意义。 摘要:While Deep Learning (DL) technologies are a promising tool to solve networking problems that map to classification tasks, their computational complexity is still too high with respect to real-time traffic measurements requirements. To reduce the DL inference cost, we propose a novel caching paradigm, that we named approximate-key caching, which returns approximate results for lookups of selected input based on cached DL inference results. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. As such, we couple approximate-key caching with an error-correction principled algorithm, that we named auto-refresh. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching -- testifying the practical interest of our proposal.

【13】 Detecting Emotion Carriers by Combining Acoustic and Lexical Representations 标题:声学和词汇相结合的情感载体检测方法 链接:https://arxiv.org/abs/2112.06603

作者:Sebastian P. Bayerl,Aniruddha Tammewar,Korbinian Riedhammer,Giuseppe Riccardi 机构: Technische Hochschule N¨urnberg Georg Simon Ohm, Germany, Signals and Interactive Systems Lab, University of Trento 备注:Accepted at ASRU 2021 this https URL 摘要:个人叙述(PN)——口头或书面——是对事实、人物、事件和个人经历的回忆。情绪识别和情绪分析任务通常在话语或文档级别定义。然而,在这项工作中,我们将重点放在情感载体(EC)上,它被定义为最能解释叙述者情感状态的片段(言语或文本)(“失去父亲”,“让我选择”)。一旦提取出来,这种EC可以提供更丰富的用户状态表示,以改进自然语言理解和对话建模。在以前的工作中,已经证明了使用词汇特征可以识别EC。然而,口语叙事应该提供更丰富的上下文描述和用户的情绪状态。在本文中,我们利用基于单词的声学和文本嵌入以及早期和晚期融合技术来检测口语叙事中的ECs。对于声学词级表示,我们使用残差神经网络(ResNet)对单独的语音情感语料库进行预训练,并对其进行微调以检测EC。使用不同融合和系统组合策略的实验表明,后期融合可以显著改善该任务。 摘要:Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of father", "made me choose"). Once extracted, such EC can provide a richer representation of the user state to improve natural language understanding and dialogue modeling. In previous work, it has been shown that EC can be identified using lexical features. However, spoken narratives should provide a richer description of the context and the users' emotional state. In this paper, we leverage word-based acoustic and textual embeddings as well as early and late fusion techniques for the detection of ECs in spoken narratives. For the acoustic word-level representations, we use Residual Neural Networks (ResNet) pretrained on separate speech emotion corpora and fine-tuned to detect EC. Experiments with different fusion and system combination strategies show that late fusion leads to significant improvements for this task.

【14】 Multi-agent Soft Actor-Critic Based Hybrid Motion Planner for Mobile Robots 标题:基于多智能体软角色-批评者的移动机器人混合运动规划器 链接:https://arxiv.org/abs/2112.06594

作者:Zichen He,Lu Dong,Chunwei Song,Changyin Sun 机构: SoutheastUniversity, Sun is with the School of Automation 摘要:本文提出了一种新型的混合式多机器人运动规划器,可应用于非通信和局部可观测条件下。该规划器是无模型的,可以实现多机器人状态和观测信息的端到端映射到最终平滑连续的轨迹。planner是一个前端和后端分离的体系结构。前端协同航路点搜索模块的设计是在集中训练和分散执行图的基础上,基于多智能体软参与者评判算法。后端轨迹优化模块的设计基于带安全区约束的最小捕捉法。该模块可以输出最终的动态可行可执行轨迹。最后,多组实验结果验证了该运动规划器的有效性。 摘要:In this paper, a novel hybrid multi-robot motion planner that can be applied under non-communication and local observable conditions is presented. The planner is model-free and can realize the end-to-end mapping of multi-robot state and observation information to final smooth and continuous trajectories. The planner is a front-end and back-end separated architecture. The design of the front-end collaborative waypoints searching module is based on the multi-agent soft actor-critic algorithm under the centralized training with decentralized execution diagram. The design of the back-end trajectory optimization module is based on the minimal snap method with safety zone constraints. This module can output the final dynamic-feasible and executable trajectories. Finally, multi-group experimental results verify the effectiveness of the proposed motion planner.

【15】 Geometric Path Enumeration for Equivalence Verification of Neural Networks 标题:神经网络等价性验证的几何路径枚举 链接:https://arxiv.org/abs/2112.06582

作者:Samuel Teuber,Marko Kleine Büning,Philipp Kern,Carsten Sinz 机构:Department of Theoretical Computer Science, Karlsruhe Institute of Technology (KIT), Germany, ©, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including 备注:None 摘要:随着神经网络(NNs)越来越多地被引入安全关键领域,在部署前对NNs进行正式验证的需求也越来越大。在这项工作中,我们重点研究了神经网络等价性的形式验证问题,旨在证明两个神经网络(如原始和压缩版本)表现出等价的行为。针对这一问题提出了两种方法:混合整数线性规划和区间传播。虽然第一种方法缺乏可伸缩性,但后一种方法仅适用于重量变化较小的结构相似的神经网络。本文的贡献分为四个部分。首先,我们证明了epsilon等价问题是coNP完备的,从而给出了一个理论结果。其次,我们将Tran等人的单NN几何路径枚举算法扩展到具有多个NN的设置。在第三步中,我们实现了等价性验证的扩展算法,并评估了其实际使用所需的优化。最后,我们进行了一次比较评估,展示了我们的方法在等价性验证和反例发现方面优于先前最新技术的用例。 摘要:As neural networks (NNs) are increasingly introduced into safety-critical domains, there is a growing need to formally verify NNs before deployment. In this work we focus on the formal verification problem of NN equivalence which aims to prove that two NNs (e.g. an original and a compressed version) show equivalent behavior. Two approaches have been proposed for this problem: Mixed integer linear programming and interval propagation. While the first approach lacks scalability, the latter is only suitable for structurally similar NNs with small weight changes. The contribution of our paper has four parts. First, we show a theoretical result by proving that the epsilon-equivalence problem is coNP-complete. Secondly, we extend Tran et al.'s single NN geometric path enumeration algorithm to a setting with multiple NNs. In a third step, we implement the extended algorithm for equivalence verification and evaluate optimizations necessary for its practical use. Finally, we perform a comparative evaluation showing use-cases where our approach outperforms the previous state of the art, both, for equivalence verification as well as for counter-example finding.

【16】 Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs 标题:拓扑不平衡对生物医学知识图表征学习的启示 链接:https://arxiv.org/abs/2112.06567

作者:Stephen Bonner,Ufuk Kirik,Ola Engkvist,Jian Tang,Ian P Barrett 机构:Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK, Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden, HEC Montreal, Canada, Mila - Quebec AI Institute, Montreal, Canada 摘要:改善疾病护理标准取决于更好的治疗,而更好的治疗又依赖于发现和开发新药。然而,药物发现是一个复杂而昂贵的过程。采用机器学习的方法产生了药物发现知识图,该图利用了该领域固有的相互关联性质。基于图的数据建模与知识图嵌入相结合,提供了更直观的领域表示,适用于推理任务,如预测缺失链接。一个这样的例子是为特定疾病生成可能相关基因的排序列表,通常称为目标发现。因此,至关重要的是,这些预测不仅具有相关性,而且具有生物学意义。然而,知识图可能会直接由于集成的底层数据源而产生偏差,或者由于图构造中的建模选择而产生偏差,其结果之一是某些实体可能在拓扑上过度呈现。我们展示了知识图嵌入模型如何受到这种结构不平衡的影响,从而导致紧密连接的实体在任何上下文中都具有较高的排名。我们在不同的数据集、模型和预测任务中为这种观察提供支持。此外,我们还展示了如何通过随机的、生物学上无意义的信息来干扰图形拓扑,从而人为地改变基因的等级。这表明,这种模型更受实体频率的影响,而不是关系中编码的生物信息,当实体频率不是底层数据的真实反映时,就会产生问题。我们的结果强调了数据建模选择的重要性,并强调了从业者在解释模型输出和知识图组合时需要注意这些问题。 摘要:Improving on the standard of care for diseases is predicated on better treatments, which in turn relies on finding and developing new drugs. However, drug discovery is a complex and costly process. Adoption of methods from machine learning has given rise to creation of drug discovery knowledge graphs which utilize the inherent interconnected nature of the domain. Graph-based data modelling, combined with knowledge graph embeddings provide a more intuitive representation of the domain and are suitable for inference tasks such as predicting missing links. One such example would be producing ranked lists of likely associated genes for a given disease, often referred to as target discovery. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, knowledge graphs can be biased either directly due to the underlying data sources that are integrated or due to modeling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We show how knowledge graph embedding models can be affected by this structural imbalance, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models and predictive tasks. Further, we show how the graph topology can be perturbed to artificially alter the rank of a gene via random, biologically meaningless information. This suggests that such models can be more influenced by the frequency of entities rather than biological information encoded in the relations, creating issues when entity frequency is not a true reflection of underlying data. Our results highlight the importance of data modeling choices and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during knowledge graph composition.

【17】 MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning 标题:MAGIC:针对不同和不成对的基于文本的图像字幕的多模态关系图对抗性推理 链接:https://arxiv.org/abs/2112.06558

作者:Wenqiao Zhang,Haochen Shi,Jiannan Guo,Shengyu Zhang,Qingpeng Cai,Juncheng Li,Sihui Luo,Yueting Zhuang 机构: Zhejiang University, Universit´e de Montr´eal, National University of Singapore, Ningbo University 摘要:基于文本的图像字幕(TextCap)要求同时理解视觉内容和阅读图像文本,以生成自然语言描述。尽管一项任务可以教会机器进一步理解复杂的人类环境,因为文本在我们的日常环境中无处不在,但它在正常字幕设置方面带来了额外的挑战。基于文本的图像直观地包含丰富而复杂的多模态关系内容,也就是说,图像细节可以从多视图而不是单个标题中进行多样化描述。当然,我们可以引入额外的成对训练数据来显示图像描述的多样性,对于带有额外文本的TextCap对注释来说,这个过程既费时又费力。基于上述观点,我们研究了如何使用一种不成对的训练范式来生成关注不同图像部分的不同字幕。我们提出了多模态关系图对抗性推理(MAGIC)框架,用于多样性和不成对的TextCap。该框架可以自适应地构造多个图像的多模态关系图,并对图之间的复杂关系进行建模,以表示描述的多样性。此外,从建模图中开发了一个级联生成对抗网络,以推断图像-句子特征对齐和语言连贯水平下的不成对字幕生成。我们验证了MAGIC从图像的不同关系信息项生成不同字幕的有效性。实验结果表明,MAGIC可以在不使用任何图像字幕训练对的情况下产生非常有希望的结果。 摘要:Text-based image captioning (TextCap) requires simultaneous comprehension of visual content and reading the text of images to generate a natural language description. Although a task can teach machines to understand the complex human environment further given that text is omnipresent in our daily surroundings, it poses additional challenges in normal captioning. A text-based image intuitively contains abundant and complex multimodal relational content, that is, image details can be described diversely from multiview rather than a single caption. Certainly, we can introduce additional paired training data to show the diversity of images' descriptions, this process is labor-intensive and time-consuming for TextCap pair annotations with extra texts. Based on the insight mentioned above, we investigate how to generate diverse captions that focus on different image parts using an unpaired training paradigm. We propose the Multimodal relAtional Graph adversarIal inferenCe (MAGIC) framework for diverse and unpaired TextCap. This framework can adaptively construct multiple multimodal relational graphs of images and model complex relationships among graphs to represent descriptive diversity. Moreover, a cascaded generative adversarial network is developed from modeled graphs to infer the unpaired caption generation in image-sentence feature alignment and linguistic coherence levels. We validate the effectiveness of MAGIC in generating diverse captions from different relational information items of an image. Experimental results show that MAGIC can generate very promising outcomes without using any image-caption training pairs.

【18】 Centroid-UNet: Detecting Centroids in Aerial Images 标题:Centroid-UNET:航空图像中的质心检测 链接:https://arxiv.org/abs/2112.06530

作者:N. Lakmal Deshapriya,Dan Tran,Sriram Reddy,Kavinda Gunasekara 机构:Geoinformatics Center, Asian Institute of Technology, P.O. Box , Klong Luang, Pathumthani , Thailand, KEY WORDS: deep-learning, satellite-imagery, centroids, building-footprint, tree-canopy 备注:None 摘要:在航空/卫星图像分析(遥感)的许多应用中,生成物体的精确形状是一项繁琐的任务。在大多数遥感应用中,如物体计数,只需要对物体进行位置估计。因此,在航空/卫星图像中定位物体质心对于不需要物体精确形状的任务来说是一个简单的解决方案。因此,本研究的重点是评估使用深度神经网络在卫星图像中定位目标质心的可行性。我们的模型名为质心UNet。质心UNet模型基于经典的U-Net语义分割体系结构。我们修改并调整了U-Net语义分割架构,使其成为一个质心检测模型,保持了原始模型的简单性。此外,我们还通过两个涉及航空/卫星图像的案例研究对我们的模型进行了测试和评估。这两个案例研究分别是建筑质心检测案例研究和椰子树质心检测案例研究。与其他方法相比,我们的评估结果达到了相当好的准确性,并且提供了简单性。根据本研究开发的代码和模型也可在Centroid UNet GitHub存储库中获得:https://github.com/gicait/centroid-unet 摘要:In many applications of aerial/satellite image analysis (remote sensing), the generation of exact shapes of objects is a cumbersome task. In most remote sensing applications such as counting objects requires only location estimation of objects. Hence, locating object centroids in aerial/satellite images is an easy solution for tasks where the object's exact shape is not necessary. Thus, this study focuses on assessing the feasibility of using deep neural networks for locating object centroids in satellite images. Name of our model is Centroid-UNet. The Centroid-UNet model is based on classic U-Net semantic segmentation architecture. We modified and adapted the U-Net semantic segmentation architecture into a centroid detection model preserving the simplicity of the original model. Furthermore, we have tested and evaluated our model with two case studies involving aerial/satellite images. Those two case studies are building centroid detection case study and coconut tree centroid detection case study. Our evaluation results have reached comparably good accuracy compared to other methods, and also offer simplicity. The code and models developed under this study are also available in the Centroid-UNet GitHub repository: https://github.com/gicait/centroid-unet

【19】 Ex-Model: Continual Learning from a Stream of Trained Models 标题:EX-Model:从训练有素的模型流中持续学习 链接:https://arxiv.org/abs/2112.06511

作者:Antonio Carta,Andrea Cossu,Vincenzo Lomonaco,Davide Bacciu 机构:University of Pisa, Scuola Normale Superiore 摘要:从非平稳数据流中不断学习是近几年来日益流行的一个具有挑战性的研究课题。能够以高效、有效和可扩展的方式不断学习、适应和推广是人工智能系统可持续发展的基础。然而,以代理为中心的持续学习观点要求直接从原始数据中学习,这限制了独立代理之间的交互、当前方法的效率和隐私。相反,我们认为,持续学习系统应该以经过训练的模型的形式利用压缩信息的可用性。在本文中,我们介绍并形式化了一种新的范例“Ex-Model连续学习”(ExML),其中agent从一系列先前训练的模型中学习,而不是从原始数据中学习。我们还提供了三种ex-model连续学习算法和一个由三个数据集(MNIST、CIFAR-10和CORe50)组成的经验设置,以及八个场景,其中对提出的算法进行了广泛测试。最后,我们强调了前模型范式的特点,并指出了有趣的未来研究方向。 摘要:Learning continually from non-stationary data streams is a challenging research topic of growing popularity in the last few years. Being able to learn, adapt, and generalize continually in an efficient, effective, and scalable way is fundamental for a sustainable development of Artificial Intelligent systems. However, an agent-centric view of continual learning requires learning directly from raw data, which limits the interaction between independent agents, the efficiency, and the privacy of current approaches. Instead, we argue that continual learning systems should exploit the availability of compressed information in the form of trained models. In this paper, we introduce and formalize a new paradigm named "Ex-Model Continual Learning" (ExML), where an agent learns from a sequence of previously trained models instead of raw data. We further contribute with three ex-model continual learning algorithms and an empirical setting comprising three datasets (MNIST, CIFAR-10 and CORe50), and eight scenarios, where the proposed algorithms are extensively tested. Finally, we highlight the peculiarities of the ex-model paradigm and we point out interesting future research directions.

【20】 Split GCN: Effective Interactive Annotation for Segmentation of Disconnected Instance 标题:Split GCN:一种有效的断开实例分割的交互式注解 链接:https://arxiv.org/abs/2112.06454

作者:Namgil Kim,Barom Kang,Yeonok Cho 机构:Department of Data Science, Ajou University, SelectStar AI Research 备注:11 pages 摘要:人工注释对象边界需要较高的成本。最近,基于多边形的人机交互注释方法已经显示出成功的性能。然而,鉴于连通顶点拓扑,这些方法难以预测对象中的断开组件。本文介绍了一种基于多边形方法和自我注意机制的新型结构splitgcn。通过提供方向信息,Split GCN使多边形顶点能够更精确地移动到对象边界。我们的模型通过使用关于顶点依赖关系的上下文交换来转换初始拓扑,成功地预测了对象的断开组件。Split GCN展示了与城市景观上最先进模型的竞争性能,以及与基线模型的更高性能。在四个跨域数据集上,我们验证了模型的泛化能力。 摘要:Annotating object boundaries by humans demands high costs. Recently, polygon-based annotation methods with human interaction have shown successful performance. However, given the connected vertex topology, these methods exhibit difficulty predicting the disconnected components in an object. This paper introduces Split-GCN, a novel architecture based on the polygon approach and self-attention mechanism. By offering the direction information, Split-GCN enables the polygon vertices to move more precisely to the object boundary. Our model successfully predicts disconnected components of an object by transforming the initial topology using the context exchange about the dependencies of vertices. Split-GCN demonstrates competitive performance with the state-of-the-art models on Cityscapes and even higher performance with the baseline models. On four cross-domain datasets, we confirm our model's generalization ability.

【21】 Generate Point Clouds with Multiscale Details from Graph-Represented Structures 标题:由图表示的结构生成具有多尺度细节的点云 链接:https://arxiv.org/abs/2112.06433

作者:Ximing Yang,Cheng Jin 机构:Fudan University, China, Shanghai 备注:8 pages, 6 figures 摘要:从结构生成点云是控制点云生成的一种非常有价值的方法。基于结构的可控点云生成的主要问题之一是缺乏对细节的可控性,因为在大多数现有的结构表示中缺少细节。可以看出,细节和结构的定义是主观的。细节可以被视为小规模的结构。为了同时表示不同尺度的结构,我们提出了一种基于图的结构表示方法,称为多尺度结构图(MSG)。通过将细节视为小尺度结构,可以在不同的尺度、位置、密度和角度上找到类似的局部结构模式。从模式中学习到的知识可以转移到其他尺度的类似模式中。提出了一种基于多尺度结构的点云生成器(MSPCG)的编码和生成机制,用于从MSG生成密集点云,该机制可以同时学习具有各种空间特性的局部模式。我们的MSPCG还具有很强的泛化能力和可扩展性。在ShapeNet数据集上训练的MSPCG可以在点云上启用多尺度编辑,为看不见的类别生成点云,并从给定结构生成室内场景。实验结果表明,我们的方法明显优于基线方法。 摘要:Generating point clouds from structures is a highly valued method to control the generation of point clouds.One of the major problems in structure-based controllable point cloud generation is the lack of controllability to details, as details are missing in most existing representations of structures.It can be observed that definitions of details and structures are subjective.Details can be treated as structures on small scale.To represent structures in different scales at the same time, we present a graph-based representation of structures called the Multiscale Structure Graph(MSG).By treating details as small-scale structures, similar patterns of local structures can be found at different scales, places, densities, and angles.The knowledge learned from a pattern can be transferred to similar patterns in other scales.An encoding and generation mechanism, namely the Multiscale Structure-based Point Cloud Generator(MSPCG), for generating dense point clouds from the MSG is proposed, which can simultaneously learn local patterns with miscellaneous spatial properties.Our MSPCG also has great generalization ability and scalability.An MSPCG trained on the ShapeNet dataset can enable multi-scale edition on point clouds, generate point clouds for unseen categories, and generate indoor scenes from a given structure. The experimental results show that our method significantly outperforms baseline methods.

【22】 GM Score: Incorporating inter-class and intra-class generator diversity, discriminability of disentangled representation, and sample fidelity for evaluating GANs 标题:GM评分:结合类间和类内生成器多样性、解缠表示的可区分性和样本保真度来评估GANS 链接:https://arxiv.org/abs/2112.06431

作者:Harshvardhan GM,Aanchal Sahu,Mahendra Kumar Gourisaria 备注:21 pages, 9 figures 摘要:与其他生成模型(如变分自动编码器(VAE)和玻尔兹曼机器)相比,生成对抗网络(GAN)因其较高的样本质量而广受欢迎,但它们在评估生成样本时也面临同样的困难。必须牢记各个方面,如生成样本的质量、类的多样性(类内和类间)、分离潜在空间的使用、所述评估指标与人类感知的一致性等。在本文中,我们提出了一个新分数,即GM分数,该模型考虑了样本质量、非纠缠表示、类内和类间多样性等多种因素,并采用了精度、召回率和F1分数等其他指标,对深层信念网络(DBN)和受限玻尔兹曼机(RBM)的潜在空间进行了判别。对在基准MNIST数据集上训练的不同GAN(GAN、DCGAN、BiGAN、CGAN、CoupledGAN、LSGAN、SGAN、WGAN和WGAN改进型)进行评估。 摘要:While generative adversarial networks (GAN) are popular for their higher sample quality as opposed to other generative models like the variational autoencoders (VAE) and Boltzmann machines, they suffer from the same difficulty of the evaluation of generated samples. Various aspects must be kept in mind, such as the quality of generated samples, the diversity of classes (within a class and among classes), the use of disentangled latent spaces, agreement of said evaluation metric with human perception, etc. In this paper, we propose a new score, namely, GM Score, which takes into various factors such as sample quality, disentangled representation, intra-class and inter-class diversity, and other metrics such as precision, recall, and F1 score are employed for discriminability of latent space of deep belief network (DBN) and restricted Boltzmann machine (RBM). The evaluation is done for different GANs (GAN, DCGAN, BiGAN, CGAN, CoupledGAN, LSGAN, SGAN, WGAN, and WGAN Improved) trained on the benchmark MNIST dataset.

【23】 Stacked Generative Machine Learning Models for Fast Approximations of Steady-State Navier-Stokes Equations 标题:稳态Navier-Stokes方程快速逼近的层叠式产生式机器学习模型 链接:https://arxiv.org/abs/2112.06419

作者:Shen Wang,Mehdi Nikfar,Joshua C. Agar,Yaling Liu 机构:Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem, PA, USA., Department of Materials Science and Engineering, Lehigh University, Bethlehem, PA, USA, Department of Bioengineering, Lehigh University, Bethlehem, PA, USA 备注:Under Review 摘要:计算流体动力学(CFD)模拟在工程和物理领域有着广泛的应用。流体动力学的标准描述要求求解不同流型下的Navier-Stokes(N-S)方程。然而,高性能计算的可用性、速度和并行性限制了CFD模拟的应用。为了提高计算效率,机器学习技术被用于创建CFD的加速数据驱动近似。大多数此类方法依赖于大型标记的CFD数据集,这些数据集在构建稳健的数据驱动模型所需的规模下获取成本高昂。我们发展了一种弱监督方法来求解各种边界条件下的稳态N-S方程,使用具有边界和几何条件的多通道输入。我们在没有任何标记的模拟数据的情况下获得了最先进的结果,但通过使用和小规模的解决方案来初始化模型以求解N-S方程,使用了自定义的数据驱动和物理信息损失函数。为了提高分辨率和可预测性,我们训练复杂度越来越高的叠加模型,生成N-S方程的数值解。在没有昂贵计算的情况下,我们的模型在各种障碍和边界条件下都具有很高的可预测性。鉴于其高度灵活性,该模型可以在常规桌面计算机上在5 ms内生成64 x 64域上的解决方案,其速度比常规CFD解算器快1000倍。在本地消费者计算硬件上转换交互式CFD模拟,可以在物联网设备上实现实时预测的新应用,在物联网设备上,数据传输是禁止的,并且可以增加边界值流体问题的规模、速度和计算成本。 摘要:Computational fluid dynamics (CFD) simulations are broadly applied in engineering and physics. A standard description of fluid dynamics requires solving the Navier-Stokes (N-S) equations in different flow regimes. However, applications of CFD simulations are computationally-limited by the availability, speed, and parallelism of high-performance computing. To improve computational efficiency, machine learning techniques have been used to create accelerated data-driven approximations for CFD. A majority of such approaches rely on large labeled CFD datasets that are expensive to obtain at the scale necessary to build robust data-driven models. We develop a weakly-supervised approach to solve the steady-state N-S equations under various boundary conditions, using a multi-channel input with boundary and geometric conditions. We achieve state-of-the-art results without any labeled simulation data, but using a custom data-driven and physics-informed loss function by using and small-scale solutions to prime the model to solve the N-S equations. To improve the resolution and predictability, we train stacked models of increasing complexity generating the numerical solutions for N-S equations. Without expensive computations, our model achieves high predictability with a variety of obstacles and boundary conditions. Given its high flexibility, the model can generate a solution on a 64 x 64 domain within 5 ms on a regular desktop computer which is 1000 times faster than a regular CFD solver. Translation of interactive CFD simulation on local consumer computing hardware enables new applications in real-time predictions on the internet of things devices where data transfer is prohibitive and can increase the scale, speed, and computational cost of boundary-value fluid problems.

【24】 A Survey of Toxic Comment Classification Methods 标题:有毒评论分类方法综述 链接:https://arxiv.org/abs/2112.06412

作者:Kehan Wang,Jiaxi Yang,Hongjun Wu 备注:5 pages, 3 figures, 2 tables, for Cornell Tech Applied Machine Learning 摘要:虽然在现实生活中,每个人至少在某种程度上都会表现出自己的行为,但期望人们在互联网上表现出自己的行为要困难得多,因为向他人发布有毒内容几乎没有检查或后果。然而,对于另一方的人来说,有毒文本往往会导致严重的心理后果。检测此类有毒文本具有挑战性。在本文中,我们尝试使用机器学习方法(包括CNN、朴素贝叶斯模型以及LSTM)构建毒性检测器。虽然其他人已经打下了许多基础,但我们的目标是建立比以前更精确的模型。我们使用LSTM和CNN生成了非常高精度的模型,并将它们与语言处理中的go-to解决方案NaiveBayes模型进行了比较。我们还采用了单词嵌入方法来增强模型的准确性。 摘要:While in real life everyone behaves themselves at least to some extent, it is much more difficult to expect people to behave themselves on the internet, because there are few checks or consequences for posting something toxic to others. Yet, for people on the other side, toxic texts often lead to serious psychological consequences. Detecting such toxic texts is challenging. In this paper, we attempt to build a toxicity detector using machine learning methods including CNN, Naive Bayes model, as well as LSTM. While there has been numerous groundwork laid by others, we aim to build models that provide higher accuracy than the predecessors. We produced very high accuracy models using LSTM and CNN, and compared them to the go-to solutions in language processing, the Naive Bayes model. A word embedding approach is also applied to empower the accuracy of our models.

【25】 Local and Global Point Cloud Reconstruction for 3D Hand Pose Estimation 标题:用于三维手势估计的局部和全局点云重建 链接:https://arxiv.org/abs/2112.06389

作者:Ziwei Yu,Linlin Yang,Shicheng Chen,Angela Yao 机构: National University of Singapore, University of Bonn, Germany 备注:The British Machine Vision Conference (BMVC) 摘要:本文讨论了从单个RGB图像重建人手的三维点云和三维姿势估计。为此,我们提出了一种新的管道,用于局部和全局点云重建,使用三维手模板,同时学习潜在的姿势估计表示。为了演示我们的方法,我们引入了一个新的多视图手姿势数据集,以获得真实世界中完整的手的三维点云。在我们新提出的数据集和四个公共基准上的实验证明了该模型的优势。在重建逼真的完整三维手点云时,我们的方法在三维姿态估计方面优于竞争对手。 摘要:This paper addresses the 3D point cloud reconstruction and 3D pose estimation of the human hand from a single RGB image. To that end, we present a novel pipeline for local and global point cloud reconstruction using a 3D hand template while learning a latent representation for pose estimation. To demonstrate our method, we introduce a new multi-view hand posture dataset to obtain complete 3D point clouds of the hand in the real world. Experiments on our newly proposed dataset and four public benchmarks demonstrate the model's strengths. Our method outperforms competitors in 3D pose estimation while reconstructing realistic-looking complete 3D hand point clouds.

【26】 Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer 标题:使用统一文本到文本转换器的法律判决预测的依赖学习 链接:https://arxiv.org/abs/2112.06370

作者:Yunyun Huang,Xiaoyu Shen,Chuanyi Li,Jidong Ge,Bin Luo 机构: Software Institute, Nanjing University 备注:The first two authors contributed equally 摘要:鉴于案件的实际情况,法律判决预测(LJP)涉及到一系列子任务,如预测违法条款、指控和处罚期限。我们建议为LJP利用统一的文本到文本转换器,其中子任务之间的依赖关系可以在自回归解码器中自然建立。与以前的工作相比,它有三个优点:(1)它适合于掩蔽语言模型的预训练模式,因此可以受益于每个子任务的语义提示,而不是将它们视为原子标签;(2)它使用单一的统一体系结构,允许在所有子任务之间完全共享参数,(3)它可以包含分类子任务和生成子任务。我们表明,这种统一的转换器,尽管对一般域文本进行了预训练,但优于专门为法律域定制的预训练模型。通过大量的实验,我们发现捕获依赖关系的最佳顺序不同于人类的直觉,人类最合理的逻辑顺序对于模型来说可能是次优的。我们还包括另外两个辅助任务:法庭视图生成和文章内容预测,表明它们不仅可以提高预测精度,而且可以在出现错误时为模型输出提供可解释的解释。在最佳配置下,我们的模型大大优于以前的SOTA和单任务版本的统一Transformer。 摘要:Given the fact of a case, Legal Judgment Prediction (LJP) involves a series of sub-tasks such as predicting violated law articles, charges and term of penalty. We propose leveraging a unified text-to-text Transformer for LJP, where the dependencies among sub-tasks can be naturally established within the auto-regressive decoder. Compared with previous works, it has three advantages: (1) it fits in the pretraining pattern of masked language models, and thereby can benefit from the semantic prompts of each sub-task rather than treating them as atomic labels, (2) it utilizes a single unified architecture, enabling full parameter sharing across all sub-tasks, and (3) it can incorporate both classification and generative sub-tasks. We show that this unified transformer, albeit pretrained on general-domain text, outperforms pretrained models tailored specifically for the legal domain. Through an extensive set of experiments, we find that the best order to capture dependencies is different from human intuitions, and the most reasonable logical order for humans can be sub-optimal for the model. We further include two more auxiliary tasks: court view generation and article content prediction, showing they can not only improve the prediction accuracy, but also provide interpretable explanations for model outputs even when an error is made. With the best configuration, our model outperforms both previous SOTA and a single-tasked version of the unified transformer by a large margin.

【27】 Neural Point Process for Learning Spatiotemporal Event Dynamics 标题:学习时空事件动力学的神经点过程 链接:https://arxiv.org/abs/2112.06351

作者:Zihao Zhou,Xingyi Yang,Ryan Rossi,Handong Zhao,Rose Yu 机构:UC San Diego, National University of Singapore, Adobe Research 摘要:学习时空事件的动力学是一个基本问题。神经点过程通过深层神经网络增强点过程模型的表达能力。然而,大多数现有的方法只考虑时间动态而没有空间建模。我们提出了深时空点过程(DeepSTPP),这是一个整合时空点过程的深动力学模型。我们的方法灵活、高效,能够准确地预测空间和时间上的不规则采样事件。我们的方法的关键结构是由潜在过程控制的非参数时空强度函数。强度函数对密度具有闭合形式积分。潜在过程捕获事件序列的不确定性。我们使用摊销变分推理来推断具有深层网络的潜在过程。使用合成数据集,我们验证了我们的模型能够准确地学习真实的强度函数。在真实的基准数据集上,我们的模型显示了优于最先进基线的性能。 摘要:Learning the dynamics of spatiotemporal events is a fundamental problem. Neural point processes enhance the expressivity of point process models with deep neural networks. However, most existing methods only consider temporal dynamics without spatial modeling. We propose Deep Spatiotemporal Point Process (DeepSTPP), a deep dynamics model that integrates spatiotemporal point processes. Our method is flexible, efficient, and can accurately forecast irregularly sampled events over space and time. The key construction of our approach is the nonparametric space-time intensity function, governed by a latent process. The intensity function enjoys closed-form integration for the density. The latent process captures the uncertainty of the event sequence. We use amortized variational inference to infer the latent process with deep networks. Using synthetic datasets, we validate our model can accurately learn the true intensity function. On real-world benchmark datasets, our model demonstrates superior performance over state-of-the-art baselines.

【28】 ValueNet: A New Dataset for Human Value Driven Dialogue System 标题:ValueNet:一种新的人类价值驱动对话系统数据集 链接:https://arxiv.org/abs/2112.06346

作者:Liang Qiu,Yizhou Zhao,Jinchao Li,Pan Lu,Baolin Peng,Jianfeng Gao,Song-Chun Zhu 机构:UCLA Center for Vision, Cognition, Learning, and Autonomy, Microsoft Research, Redmond 备注:Paper accepted by AAAI 2022 摘要:构建一个具有社会智能的代理涉及到许多挑战,其中之一就是教代理像人一样在其价值观的指导下说话。然而,价值驱动的聊天机器人在对话系统领域仍然没有得到充分的研究。大多数现有数据集集中于常识推理或社会规范建模。在这项工作中,我们提出了一个新的大规模人类价值数据集ValueNet,其中包含21374个文本场景中的人类态度。该数据集按十个维度组织,符合跨文化研究中的基本人类价值理论。我们进一步在ValueNet上开发了基于Transformer的价值回归模型,以了解效用分布。综合实证结果表明,学习价值模型可以使广泛的对话任务受益。例如,通过使用强化学习和价值模型的奖励教授生成代理,我们的方法在个性化对话生成数据集:Persona Chat上实现了最先进的性能。现有的情感识别模型以值作为附加特征,能够捕获上下文中丰富的人类情感,从而进一步提高移情对话数据集中移情反应生成的性能。据我们所知,ValueNet是第一个用于人类价值建模的大型文本数据集,我们也是第一个尝试将价值模型纳入情感智能对话系统的人。该数据集可在https://liang-qiu.github.io/ValueNet/. 摘要:Building a socially intelligent agent involves many challenges, one of which is to teach the agent to speak guided by its value like a human. However, value-driven chatbots are still understudied in the area of dialogue systems. Most existing datasets focus on commonsense reasoning or social norm modeling. In this work, we present a new large-scale human value dataset called ValueNet, which contains human attitudes on 21,374 text scenarios. The dataset is organized in ten dimensions that conform to the basic human value theory in intercultural research. We further develop a Transformer-based value regression model on ValueNet to learn the utility distribution. Comprehensive empirical results show that the learned value model could benefit a wide range of dialogue tasks. For example, by teaching a generative agent with reinforcement learning and the rewards from the value model, our method attains state-of-the-art performance on the personalized dialog generation dataset: Persona-Chat. With values as additional features, existing emotion recognition models enable capturing rich human emotions in the context, which further improves the empathetic response generation performance in the EmpatheticDialogues dataset. To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems. The dataset is available at https://liang-qiu.github.io/ValueNet/.

【29】 A Survey on Societal Event Forecasting with Deep Learning 标题:基于深度学习的社会事件预测研究综述 链接:https://arxiv.org/abs/2112.06345

作者:Songgaojun Deng,Yue Ning 机构: Stevens Institute of Technology 备注:31 pages, 12 figures, 4 tables 摘要:人口层面的社会事件,如内乱和犯罪,往往对我们的日常生活产生重大影响。预测此类事件对于决策和资源配置具有重要意义。由于缺乏关于事件发生的真正原因和潜在机制的知识,事件预测历来具有挑战性。近年来,由于两个主要原因,事件预测的研究取得了重大进展:(1)机器学习和深度学习算法的发展;(2)公共数据(如社交媒体、新闻来源、博客、经济指标和其他元数据来源)的可访问性。数据的爆炸式增长和软件/硬件技术的显著进步导致了深度学习技术在社会事件研究中的应用。本文致力于为社会事件预测的深度学习技术提供一个系统和全面的概述。我们关注社会事件的两个领域:\textit{内乱}和\textit{犯罪}。我们首先介绍如何将事件预测问题表述为机器学习预测任务。然后,我们总结了针对这些问题的数据资源、传统方法和深度学习模型的最新发展。最后,我们讨论了社会事件预测面临的挑战,并对未来的研究提出了一些有希望的方向。 摘要:Population-level societal events, such as civil unrest and crime, often have a significant impact on our daily life. Forecasting such events is of great importance for decision-making and resource allocation. Event prediction has traditionally been challenging due to the lack of knowledge regarding the true causes and underlying mechanisms of event occurrence. In recent years, research on event forecasting has made significant progress due to two main reasons: (1) the development of machine learning and deep learning algorithms and (2) the accessibility of public data such as social media, news sources, blogs, economic indicators, and other meta-data sources. The explosive growth of data and the remarkable advancement in software/hardware technologies have led to applications of deep learning techniques in societal event studies. This paper is dedicated to providing a systematic and comprehensive overview of deep learning technologies for societal event predictions. We focus on two domains of societal events: \textit{civil unrest} and \textit{crime}. We first introduce how event forecasting problems are formulated as a machine learning prediction task. Then, we summarize data resources, traditional methods, and recent development of deep learning models for these problems. Finally, we discuss the challenges in societal event forecasting and put forward some promising directions for future research.

【30】 Representing Knowledge as Predictions (and State as Knowledge) 标题:将知识表示为预测(将状态表示为知识) 链接:https://arxiv.org/abs/2112.06336

作者:Mark Ring 备注:Other than a few edits suggested by generous colleagues, this paper has not changed since roughly 2013. Thus, some aspects of it are now dated; for example, GVFs (aka "forecasts") and off-policy learning are now well known. Nevertheless, I believe this paper still has useful insights to offer the community, especially the growing community of enthusiastic researchers in Continual Learning 摘要:这篇文章展示了一个单一的机制是如何让知识直接从一个代理的原始感觉运动流逐层构建的。这种机制,即一般价值函数(GVF)或“预测”,将高级抽象知识捕获为一组关于现有特征和知识的预测,完全基于代理的低级感觉和行为。因此,预测提供了一种表示方法,可以将原始的感觉运动数据组织成无数层的有用抽象——这是人工智能和认知科学长期追求的目标。本文的核心是一个详细的思维实验,提供了一个具体的、一步一步的形式化说明,说明人工智能体如何仅从原始的感觉运动经验中构建真实、有用、抽象的知识。知识表示为一组关于代理行为的观察结果的分层预测(预测)。此图显示了十二个独立的层:最低层由原始像素、触摸和力传感器以及少量动作组成;更高层的抽象不断增加,最终形成了关于代理世界的丰富知识,大致对应于门口、墙壁、房间和楼层平面图。然后,我认为,这种普遍的机制可能允许代表广泛的日常人类知识。 摘要:This paper shows how a single mechanism allows knowledge to be constructed layer by layer directly from an agent's raw sensorimotor stream. This mechanism, the General Value Function (GVF) or "forecast," captures high-level, abstract knowledge as a set of predictions about existing features and knowledge, based exclusively on the agent's low-level senses and actions. Thus, forecasts provide a representation for organizing raw sensorimotor data into useful abstractions over an unlimited number of layers--a long-sought goal of AI and cognitive science. The heart of this paper is a detailed thought experiment providing a concrete, step-by-step formal illustration of how an artificial agent can build true, useful, abstract knowledge from its raw sensorimotor experience alone. The knowledge is represented as a set of layered predictions (forecasts) about the agent's observed consequences of its actions. This illustration shows twelve separate layers: the lowest consisting of raw pixels, touch and force sensors, and a small number of actions; the higher layers increasing in abstraction, eventually resulting in rich knowledge about the agent's world, corresponding roughly to doorways, walls, rooms, and floor plans. I then argue that this general mechanism may allow the representation of a broad spectrum of everyday human knowledge.

【31】 Weakly Supervised Mapping of Natural Language to SQL through Question Decomposition 标题:基于问题分解的自然语言到SQL的弱监督映射 链接:https://arxiv.org/abs/2112.06311

作者:Tomer Wolfson,Jonathan Berant,Daniel Deutch 机构:Tel Aviv University, Allen Institute for AI 备注:Preprint 摘要数据库的自然语言接口(NLIDB),其中用户以自然语言(NL)提出查询,对于使非专家能够从数据中获得见解至关重要。相比之下,开发这样的接口依赖于专家,他们经常编写启发式代码来将NL映射到SQL。或者,基于机器学习模型的NLIDB依赖于NL到SQL映射(NL-SQL对)的监督示例作为训练数据。这样的例子再次使用专家获取,这通常不仅仅是一次性的互动。即,部署NLIDB的每个数据域可能具有不同的特征,因此需要专用的启发式或特定于域的训练示例。为此,我们提出了一种基于机器学习的NLIDB训练的替代方法,即使用弱监督。我们使用最近提出的问题分解表示法QDMR,它是NL和形式查询语言之间的中间语言。最近的工作表明,非专家通常能够成功地将NL翻译成QDMR。因此,我们使用NL-QDMR对以及问题答案作为自动合成SQL查询的监督。然后使用NL问题和合成SQL来训练NL到SQL模型,并在五个基准数据集上进行测试。大量的实验表明,我们的解决方案不需要专家注释,与在专家注释数据上训练的模型相比具有竞争力。 摘要:Natural Language Interfaces to Databases (NLIDBs), where users pose queries in Natural Language (NL), are crucial for enabling non-experts to gain insights from data. Developing such interfaces, by contrast, is dependent on experts who often code heuristics for mapping NL to SQL. Alternatively, NLIDBs based on machine learning models rely on supervised examples of NL to SQL mappings (NL-SQL pairs) used as training data. Such examples are again procured using experts, which typically involves more than a one-off interaction. Namely, each data domain in which the NLIDB is deployed may have different characteristics and therefore require either dedicated heuristics or domain-specific training examples. To this end, we propose an alternative approach for training machine learning-based NLIDBs, using weak supervision. We use the recently proposed question decomposition representation called QDMR, an intermediate between NL and formal query languages. Recent work has shown that non-experts are generally successful in translating NL to QDMR. We consequently use NL-QDMR pairs, along with the question answers, as supervision for automatically synthesizing SQL queries. The NL questions and synthesized SQL are then used to train NL-to-SQL models, which we test on five benchmark datasets. Extensive experiments show that our solution, requiring zero expert annotations, performs competitively with models trained on expert annotated data.

【32】 Spatial-Temporal-Fusion BNN: Variational Bayesian Feature Layer 标题:时空融合BNN:变分贝叶斯特征层 链接:https://arxiv.org/abs/2112.06281

作者:Shiye Lei,Zhuozhuo Tu,Leszek Rutkowski,Feng Zhou,Li Shen,Fengxiang He,Dacheng Tao 机构: Rutkowski is with the Department of Artificial Intelligence, University of Social Sciences 摘要:贝叶斯神经网络(BNN)已成为缓解深度学习中过度自信预测的主要方法,但由于分布参数的大量存在,贝叶斯神经网络往往存在尺度问题。在本文中,我们发现,当单独重新训练时,深度网络的第一层具有多个不同的最优解。这表明,当第一层被贝叶斯层改变时,后验方差较大,这促使我们设计时空融合BNN(STF-BNN),以便有效地将BNN扩展到大型模型:(1)首先正常地从头开始训练神经网络,以实现快速训练;(2)将第一层转换为贝叶斯模型,并采用随机变分推理进行推理,而其他层是固定的。与普通的贝叶斯网络相比,该方法可以大大减少训练时间和参数数量,从而有效地扩展贝叶斯网络。我们进一步为STF-BNN的普遍性和缓解过度自信的能力提供了理论保证。综合实验表明,STF-BNN(1)在预测和不确定度量化方面达到了最新水平;(2) 显著提高对抗鲁棒性和隐私保护;(3)大大减少了训练时间和内存成本。 摘要:Bayesian neural networks (BNNs) have become a principal approach to alleviate overconfident predictions in deep learning, but they often suffer from scaling issues due to a large number of distribution parameters. In this paper, we discover that the first layer of a deep network possesses multiple disparate optima when solely retrained. This indicates a large posterior variance when the first layer is altered by a Bayesian layer, which motivates us to design a spatial-temporal-fusion BNN (STF-BNN) for efficiently scaling BNNs to large models: (1) first normally train a neural network from scratch to realize fast training; and (2) the first layer is converted to Bayesian and inferred by employing stochastic variational inference, while other layers are fixed. Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently. We further provide theoretical guarantees on the generalizability and the capability of mitigating overconfidence of STF-BNN. Comprehensive experiments demonstrate that STF-BNN (1) achieves the state-of-the-art performance on prediction and uncertainty quantification; (2) significantly improves adversarial robustness and privacy preservation; and (3) considerably reduces training time and memory costs.

【33】 SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification 标题:SparseFed:利用稀疏化技术缓解联邦学习中的模型中毒攻击 链接:https://arxiv.org/abs/2112.06274

作者:Ashwinee Panda,Saeed Mahloujifar,Arjun N. Bhagoji,Supriyo Chakraborty,Prateek Mittal 机构:Princeton University, University of Chicago, IBM 摘要:联邦学习天生容易受到模型中毒攻击,因为它的分散性允许攻击者参与受损设备。在模型中毒攻击中,攻击者通过上传“中毒”更新来降低模型在目标子任务(例如,将飞机分类为鸟类)上的性能。在本报告中,我们介绍了\algoname{},这是一种新的防御方法,使用全局top-k更新稀疏化和设备级梯度剪裁来减轻模型中毒攻击。我们提出了一个分析防中毒攻击鲁棒性的理论框架,并对我们的算法进行了鲁棒性和收敛性分析。为了验证其经验有效性,我们对计算机视觉和联合学习的多个基准数据集进行了大规模开源评估。 摘要:Federated learning is inherently vulnerable to model poisoning attacks because its decentralized nature allows attackers to participate with compromised devices. In model poisoning attacks, the attacker reduces the model's performance on targeted sub-tasks (e.g. classifying planes as birds) by uploading "poisoned" updates. In this report we introduce \algoname{}, a novel defense that uses global top-k update sparsification and device-level gradient clipping to mitigate model poisoning attacks. We propose a theoretical framework for analyzing the robustness of defenses against poisoning attacks, and provide robustness and convergence analysis of our algorithm. To validate its empirical efficacy we conduct an open-source evaluation at scale across multiple benchmark datasets for computer vision and federated learning.

【34】 Up to 100x Faster Data-free Knowledge Distillation 标题:无数据知识提炼速度最高可提高100倍 链接:https://arxiv.org/abs/2112.06253

作者:Gongfan Fang,Kanya Mo,Xinchao Wang,Jie Song,Shitao Bei,Haofei Zhang,Mingli Song 机构:National University of Singapore, Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies 摘要:数据自由知识提取(DFKD)由于其仅使用合成数据压缩模型的能力,近年来越来越受到研究界的关注。尽管取得了令人鼓舞的成果,但最先进的DFKD方法仍然存在数据合成效率低下的问题,使得无数据训练过程非常耗时,因此不适用于大规模任务。在这项工作中,我们介绍了一种有效的方案,称为FastDFKD,它允许我们将DFKD加速一个数量级。我们方法的核心是一种新的策略,即重用训练数据中的共享公共特征,从而合成不同的数据实例。与以前独立优化一组数据的方法不同,我们建议学习一种元合成器,该元合成器寻找共同特征作为快速数据合成的初始化。因此,FastDFKD只需几个步骤即可实现数据合成,显著提高了无数据训练的效率。在CIOFAR、NYV2和ImageNet上的实验表明,所提出的FASTFFKD达到10 $ \倍$,甚至100 $ \倍$加速,同时保持性能与现有技术的PAR。 摘要:Data-free knowledge distillation (DFKD) has recently been attracting increasing attention from research communities, attributed to its capability to compress a model only using synthetic data. Despite the encouraging results achieved, state-of-the-art DFKD methods still suffer from the inefficiency of data synthesis, making the data-free training process extremely time-consuming and thus inapplicable for large-scale tasks. In this work, we introduce an efficacious scheme, termed as FastDFKD, that allows us to accelerate DFKD by a factor of orders of magnitude. At the heart of our approach is a novel strategy to reuse the shared common features in training data so as to synthesize different data instances. Unlike prior methods that optimize a set of data independently, we propose to learn a meta-synthesizer that seeks common features as the initialization for the fast data synthesis. As a result, FastDFKD achieves data synthesis within only a few steps, significantly enhancing the efficiency of data-free training. Experiments over CIFAR, NYUv2, and ImageNet demonstrate that the proposed FastDFKD achieves 10$\times$ and even 100$\times$ acceleration while preserving performances on par with state of the art.

【35】 DeepFIB: Self-Imputation for Time Series Anomaly Detection 标题:DeepFIB:用于时间序列异常检测的自归罪算法 链接:https://arxiv.org/abs/2112.06247

作者:Minhao Liu,Zhijian Xu,Qiang Xu 摘要:时间序列(TS)异常检测(AD)在各种应用中起着至关重要的作用,例如金融和医疗监控中的欺诈检测。由于异常固有的不可预测性和高度多样性以及历史数据中缺乏异常标签,AD问题通常被描述为无监督学习问题。现有解决方案的性能往往不令人满意,尤其是在数据稀缺的情况下。为了解决这个问题,我们提出了一种新的时间序列AD自监督学习技术,即\emph{DeepFIB}。我们通过掩蔽TS中的某些元素并将其与其余元素一起输入,将问题建模为一个\emph{填充空白}博弈。考虑到TS数据中两种常见的异常形状(点异常或序列异常),我们使用许多自生成的训练样本实现了两种掩蔽策略。与现有的AD解决方案相比,相应的自插补网络可以提取出更稳健的时间关系,并有效地促进识别两种类型的异常。对于连续异常值,我们还提出了一种异常定位算法,该算法可以显著减少AD错误。在各种真实TS数据集上的实验表明,DeepFIB的表现大大优于最先进的方法,F1成绩的相对提高高达65.2\%$。 摘要:Time series (TS) anomaly detection (AD) plays an essential role in various applications, e.g., fraud detection in finance and healthcare monitoring. Due to the inherently unpredictable and highly varied nature of anomalies and the lack of anomaly labels in historical data, the AD problem is typically formulated as an unsupervised learning problem. The performance of existing solutions is often not satisfactory, especially in data-scarce scenarios. To tackle this problem, we propose a novel self-supervised learning technique for AD in time series, namely \emph{DeepFIB}. We model the problem as a \emph{Fill In the Blank} game by masking some elements in the TS and imputing them with the rest. Considering the two common anomaly shapes (point- or sequence-outliers) in TS data, we implement two masking strategies with many self-generated training samples. The corresponding self-imputation networks can extract more robust temporal relations than existing AD solutions and effectively facilitate identifying the two types of anomalies. For continuous outliers, we also propose an anomaly localization algorithm that dramatically reduces AD errors. Experiments on various real-world TS datasets demonstrate that DeepFIB outperforms state-of-the-art methods by a large margin, achieving up to $65.2\%$ relative improvement in F1-score.

【36】 Video as Conditional Graph Hierarchy for Multi-Granular Question Answering 标题:用于多粒度问答的视频作为条件图层次结构 链接:https://arxiv.org/abs/2112.06197

作者:Junbin Xiao,Angela Yao,Zhiyuan Liu,Yicong Li,Wei Ji,Tat-Seng Chua 机构:Department of Computer Science, National University of Singapore 备注:Accepted to prepresent at AAAI'22 摘要:视频问答要求模型理解并推理复杂的视频和语言数据,以正确得出答案。现有的工作集中在设计复杂的跨模态交互,以融合来自两种模态的信息,同时将视频和问题整体编码为帧和单词序列。尽管这些方法取得了成功,但它们基本上都是围绕着视频和问题内容的顺序性展开的,对问题的回答缺乏洞察,也缺乏可解释性。在这项工作中,我们认为,虽然视频是以帧顺序呈现的,但视觉元素(例如,对象、动作、活动和事件)在语义空间中不是顺序的,而是分层的。为了与语言查询中语言概念的多粒度本质保持一致,我们建议将视频建模为一个条件图层次结构,在相应文本线索的指导下,以层次方式将不同粒度的视觉事实编织在一起。尽管简单,我们的大量实验证明了这种条件层次图结构的优越性,与以前的方法相比,性能有了明显的改进,并且在不同类型的问题上也有了更好的泛化。进一步的分析也巩固了模型的可靠性,因为它为预测的答案提供了有意义的视觉文本证据。 摘要:Video question answering requires models to understand and reason about both complex video and language data to correctly derive answers. Existing efforts focus on designing sophisticated cross-modal interactions to fuse the information from two modalities, while encoding the video and question holistically as frame and word sequences. Despite their success, these methods are essentially revolving around the sequential nature of video- and question-contents, providing little insight to the problem of question-answering and lacking interpretability as well. In this work, we argue that while video is presented in frame sequence, the visual elements (eg, objects, actions, activities and events) are not sequential but rather hierarchical in semantic space. To align with the multi-granular essence of linguistic concepts in language queries, we propose to model video as a conditional graph hierarchy which weaves together visual facts of different granularity in a level-wise manner, with the guidance of corresponding textual cues. Despite the simplicity, our extensive experiments demonstrate the superiority of such conditional hierarchical graph architecture, with clear performance improvements over prior methods and also better generalization across different type of questions. Further analyses also consolidate the model's reliability as it shows meaningful visual-textual evidences for the predicted answers.

【37】 Predicting Above-Sentence Discourse Structure using Distant Supervision from Topic Segmentation 标题:基于主题切分的远距离监督预测句上语篇结构 链接:https://arxiv.org/abs/2112.06196

作者:Patrick Huber,Linzi Xing,Giuseppe Carenini 机构:Department of Computer Science, University of British Columbia, Vancouver, BC, Canada, V,T ,Z 备注:AAAI 2022 摘要:RST风格的语篇分析在许多NLP任务中起着至关重要的作用,揭示了潜在复杂多样文档的潜在语义/语用结构。尽管它很重要,但现代语篇分析中最普遍的限制之一是缺乏大规模数据集。为了克服数据稀疏性问题,最近提出了情感分析和摘要等任务的远程监督方法。在这里,我们通过利用话题分割的远程监控来扩展这一研究领域,这可以为高层话语结构提供一个强大且经常互补的信号。在两个人类注释话语树库上的实验证实,我们的方案在句子和段落层面上生成了准确的树结构,在句子到文档任务上始终优于以前的远程监督模型,并且偶尔在句子到段落层面上获得更高的分数。 摘要:RST-style discourse parsing plays a vital role in many NLP tasks, revealing the underlying semantic/pragmatic structure of potentially complex and diverse documents. Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets. To overcome the data sparsity issue, distantly supervised approaches from tasks like sentiment analysis and summarization have been recently proposed. Here, we extend this line of research by exploiting distant supervision from topic segmentation, which can arguably provide a strong and oftentimes complementary signal for high-level discourse structures. Experiments on two human-annotated discourse treebanks confirm that our proposal generates accurate tree structures on sentence and paragraph level, consistently outperforming previous distantly supervised models on the sentence-to-document task and occasionally reaching even higher scores on the sentence-to-paragraph level.

【38】 MPLR: a novel model for multi-target learning of logical rules for knowledge graph reasoning 标题:MPLR:一种新的知识图推理逻辑规则多目标学习模型 链接:https://arxiv.org/abs/2112.06189

作者:Yuliang Wei,Haotian Li,Guodong Xin,Yao Wang,Bailing Wang 机构:School of Computer Science and Technology, Harbin Institute of Technology at Weihai, China, A R T I C L E I N F O 备注:Submitted to the journal of Information Sciences for possible publication 摘要:大规模知识图(KG)提供了人类知识的结构化表示。然而,由于不可能包含所有知识,KG通常是不完整的。基于现有事实的推理为发现缺失的事实铺平了道路。在本文中,我们研究了在知识图上学习逻辑规则以完成缺失事实三元组的推理问题。学习逻辑规则使模型具有很强的可解释性以及推广到类似任务的能力。我们提出了一个称为MPLR的模型,该模型改进了现有模型,以充分利用训练数据,并考虑了多目标场景。此外,考虑到在评估模型性能和挖掘规则质量方面的不足,我们进一步提出了两个新的指标来帮助解决这个问题。实验结果表明,在五个基准数据集上,我们的MPLR模型优于最先进的方法。结果也证明了指标的有效性。 摘要:Large-scale knowledge graphs (KGs) provide structured representations of human knowledge. However, as it is impossible to contain all knowledge, KGs are usually incomplete. Reasoning based on existing facts paves a way to discover missing facts. In this paper, we study the problem of learning logic rules for reasoning on knowledge graphs for completing missing factual triplets. Learning logic rules equips a model with strong interpretability as well as the ability to generalize to similar tasks. We propose a model called MPLR that improves the existing models to fully use training data and multi-target scenarios are considered. In addition, considering the deficiency in evaluating the performance of models and the quality of mined rules, we further propose two novel indicators to help with the problem. Experimental results empirically demonstrate that our MPLR model outperforms state-of-the-art methods on five benchmark datasets. The results also prove the effectiveness of the indicators.

【39】 Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward 标题:基于危险仲裁奖励的自动驾驶多Agent漏洞发现 链接:https://arxiv.org/abs/2112.06185

作者:Weilin Liu,Ye Mu,Chao Yu,Xuefei Ning,Zhong Cao,Yi Wu,Shuang Liang,Huazhong Yang,Yu Wang 机构: Cao is with the School of Vehicle and Mobility 摘要:发现危险场景对于测试和进一步改进驾驶政策至关重要。然而,进行有效的驾驶政策测试面临两个关键挑战。一方面,在测试训练有素的自动驾驶策略时,自然遇到危险场景的概率较低。因此,通过纯粹的真实道路测试发现这些场景的成本极高。另一方面,这项任务需要正确确定事故责任。收集错误归因责任的场景将导致过度保守的自主驾驶策略。更具体地说,我们的目标是发现与自动驾驶车辆(AV)相关的危险场景,即测试驾驶政策的漏洞。为此,本工作提出了一个基于多智能体强化学习的安全测试框架,通过寻找Av责任场景(STAR)。STARS通过引入危险仲裁奖励(HAR),引导其他交通参与者产生Av责任场景,并使测试中的驾驶政策行为不当。HAR使我们的框架能够发现多样化、复杂和与AV相关的危险场景。在三种环境中针对四种不同驾驶策略的实验结果表明,STARS能够有效地发现与AV相关的危险场景。这些场景确实对应于测试中驾驶政策的漏洞,因此对其进一步改进具有重要意义。 摘要:Discovering hazardous scenarios is crucial in testing and further improving driving policies. However, conducting efficient driving policy testing faces two key challenges. On the one hand, the probability of naturally encountering hazardous scenarios is low when testing a well-trained autonomous driving strategy. Thus, discovering these scenarios by purely real-world road testing is extremely costly. On the other hand, a proper determination of accident responsibility is necessary for this task. Collecting scenarios with wrong-attributed responsibilities will lead to an overly conservative autonomous driving strategy. To be more specific, we aim to discover hazardous scenarios that are autonomous-vehicle responsible (AV-responsible), i.e., the vulnerabilities of the under-test driving policy. To this end, this work proposes a Safety Test framework by finding Av-Responsible Scenarios (STARS) based on multi-agent reinforcement learning. STARS guides other traffic participants to produce Av-Responsible Scenarios and make the under-test driving policy misbehave via introducing Hazard Arbitration Reward (HAR). HAR enables our framework to discover diverse, complex, and AV-responsible hazardous scenarios. Experimental results against four different driving policies in three environments demonstrate that STARS can effectively discover AV-responsible hazardous scenarios. These scenarios indeed correspond to the vulnerabilities of the under-test driving policies, thus are meaningful for their further improvements.

【40】 Semi-supervised Domain Adaptive Structure Learning 标题:半监督领域自适应结构学习 链接:https://arxiv.org/abs/2112.06161

作者:Can Qin,Lichen Wang,Qianqian Ma,Yu Yin,Huan Wang,Yun Fu 机构: Boston University 摘要:半监督域自适应(SSDA)是一个相当具有挑战性的问题,需要方法克服1)对注释不良的数据的过度拟合和2)跨域的分布转移。不幸的是,域自适应(DA)和半监督学习(SSL)方法的简单组合往往无法解决这两个对象,因为训练数据偏向于标记样本。在本文中,我们介绍了一种自适应结构学习方法来规范SSL和DA的合作。受多视图学习的启发,我们提出的框架由一个共享的特征编码器网络和两个分类器网络组成,用于相互矛盾的目的。其中,一个分类器用于对目标特征进行分组,以提高类内密度,扩大分类聚类的差距,实现鲁棒表示学习。同时,另一个分类器作为正则化器,试图分散源特征以增强决策边界的平滑度。目标聚类和源扩展的迭代使目标特征很好地封闭在相应源点的扩展边界内。对于跨域特征对齐和部分标记数据学习的联合地址,我们应用最大平均差异(MMD)距离最小化和自训练(ST)将矛盾结构投影到共享视图中,以做出可靠的最终决策。在标准SSDA基准(包括DomainNet和Office home)上的实验结果表明,与最先进的方法相比,我们的方法具有准确性和鲁棒性。 摘要:Semi-supervised domain adaptation (SSDA) is quite a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. Unfortunately, a simple combination of domain adaptation (DA) and semi-supervised learning (SSL) methods often fail to address such two objects because of training data bias towards labeled samples. In this paper, we introduce an adaptive structure learning method to regularize the cooperation of SSL and DA. Inspired by the multi-views learning, our proposed framework is composed of a shared feature encoder network and two classifier networks, trained for contradictory purposes. Among them, one of the classifiers is applied to group target features to improve intra-class density, enlarging the gap of categorical clusters for robust representation learning. Meanwhile, the other classifier, serviced as a regularizer, attempts to scatter the source features to enhance the smoothness of the decision boundary. The iterations of target clustering and source expansion make the target features being well-enclosed inside the dilated boundary of the corresponding source points. For the joint address of cross-domain features alignment and partially labeled data learning, we apply the maximum mean discrepancy (MMD) distance minimization and self-training (ST) to project the contradictory structures into a shared view to make the reliable final decision. The experimental results over the standard SSDA benchmarks, including DomainNet and Office-home, demonstrate both the accuracy and robustness of our method over the state-of-the-art approaches.

【41】 PRNet: A Periodic Residual Learning Network for Crowd Flow Forecasting 标题:PRNet:一种用于人群流量预测的周期性残差学习网络 链接:https://arxiv.org/abs/2112.06132

作者:Chengxin Wang,Yuxuan Liang,Gary Tan 机构:National University of Singapore 摘要:人群流量预测,例如预测进入或离开特定区域的人群,对于现实世界的城市应用具有重要意义。人流数据的关键特性之一是周期性:以固定时间间隔出现的模式,如每周模式。为了捕捉这种周期性,现有的研究要么基于周期隐藏状态显式地对其建模,要么通过将所有周期段反馈到神经网络来隐式地学习它。在本文中,我们设计了一种新的周期剩余学习网络(PRNet),以更好地模拟人流数据的周期性。与现有方法不同,PRNet通过对输入(前一时间段)和输出(未来时间段)之间的偏差进行建模,将人群流量预测作为一个周期性剩余学习问题。与直接预测高度动态的人群流相比,学习这种平稳偏差要容易得多,从而便于模型训练。此外,学习到的偏差使网络能够在每个时间间隔产生未来条件与其相应每周观测值之间的残差,因此有助于更好的预测。我们进一步提出了一种轻量级的空间信道增强编码器,通过联合捕获全局空间相关性和时间相关性来构建更强大的区域表示。在两个真实数据集上的实验结果表明,PRNet在准确性和鲁棒性方面都优于最先进的方法。 摘要:Crowd flow forecasting, e.g., predicting the crowds entering or leaving certain regions, is of great importance to real-world urban applications. One of the key properties of crowd flow data is periodicity: a pattern that occurs at regular time intervals, such as a weekly pattern. To capture such periodicity, existing studies either explicitly model it based on the periodic hidden states or implicitly learn it by feeding all periodic segments into neural networks. In this paper, we devise a novel periodic residual learning network (PRNet) for better modeling the periodicity in crowd flow data. Differing from existing methods, PRNet frames the crowd flow forecasting as a periodic residual learning problem by modeling the deviation between the input (the previous time period) and the output (the future time period). As compared to predicting highly dynamic crowd flows directly, learning such stationary deviation is much easier, which thus facilitates the model training. Besides, the learned deviation enables the network to produce the residual between future conditions and its corresponding weekly observations at each time interval, and therefore contributes to substantially better predictions. We further propose a lightweight Spatial-Channel Enhanced Encoder to build more powerful region representations, by jointly capturing global spatial correlations and temporal dependencies. Experimental results on two real-world datasets demonstrate that PRNet outperforms the state-of-the-art methods in terms of both accuracy and robustness.

【42】 Online Adaptation of Neural Network Models by Modified Extended Kalman Filter for Customizable and Transferable Driving Behavior Prediction 标题:改进的扩展卡尔曼过滤在线自适应神经网络模型用于可定制和可转移的驾驶行为预测 链接:https://arxiv.org/abs/2112.06129

作者:Letian Wang,Yeping Hu,Changliu Liu 机构:Carnegie Mellon University, Pittsburgh, PA, USA , University of California, Berkeley, Berkeley, CA, USA 备注:Accepted by the AAAI Conference on Artificial Intelligence 2022, Human-Centric Self-Supervised Learning Workshop. Dec 2021. arXiv admin note: text overlap with arXiv:2111.00788 摘要:由于人类行为的随机性、异质性和时变性,人类驾驶员的高保真行为预测对于自动驾驶车辆的高效安全部署至关重要。一方面,训练后的预测模型只能捕获平均意义上的运动模式,而个体之间的细微差别很难反映出来。另一方面,在训练集上训练的预测模型可能无法推广到可能处于不同场景或数据分布中的测试集,从而导致低可转移性和可推广性。在本文中,我们将$\tau$步修正的扩展卡尔曼滤波参数自适应算法(MEKF$\uLambda$)应用到驾驶行为预测任务中,这在文献中是没有研究过的。通过对观测轨迹的反馈,该算法被应用到基于神经网络的模型中,以提高不同人类主体和场景下驾驶行为预测的性能。提出了一套新的指标,用于系统评估在线适应性能,以减少不同个体和场景的预测误差。还提供了关于模型中最佳层的实证研究以及观察适应的步骤。 摘要:High fidelity behavior prediction of human drivers is crucial for efficient and safe deployment of autonomous vehicles, which is challenging due to the stochasticity, heterogeneity, and time-varying nature of human behaviors. On one hand, the trained prediction model can only capture the motion pattern in an average sense, while the nuances among individuals can hardly be reflected. On the other hand, the prediction model trained on the training set may not generalize to the testing set which may be in a different scenario or data distribution, resulting in low transferability and generalizability. In this paper, we applied a $\tau$-step modified Extended Kalman Filter parameter adaptation algorithm (MEKF$_\lambda$) to the driving behavior prediction task, which has not been studied before in literature. With the feedback of the observed trajectory, the algorithm is applied to neural-network-based models to improve the performance of driving behavior predictions across different human subjects and scenarios. A new set of metrics is proposed for systematic evaluation of online adaptation performance in reducing the prediction error for different individuals and scenarios. Empirical studies on the best layer in the model and steps of observation to adapt are also provided.

【43】 Real-world challenges for reinforcement learning in building control 标题:强化学习在建筑控制中的现实挑战 链接:https://arxiv.org/abs/2112.06127

作者:Zoltan Nagy,Kingsley Nweye 机构:Department of Civil, Environmental and Architectural Engineering, The University of Texas at Austin, Austin, Texas, USA 摘要:在此之前的研究基础上,强调了建筑控制研究标准化环境的必要性,并受最近引入的现实生活强化学习控制基准的启发,我们提出了强化学习建筑控制器的非详尽的九个现实世界挑战。我们认为,除了为可重复性提供标准化环境外,建筑控制研究还应在此框架中表达。先进的控制器,如模型预测控制和强化学习控制,既有优点,也有缺点,使它们无法在现实建筑中实现。两者之间的比较很少,而且往往是有偏见的。通过关注基准问题和挑战,我们可以调查控制器在各种情况下的性能,并进行公平比较。最后,我们呼吁研究界做出更跨学科的努力,以应对现实世界的挑战,并挖掘先进建筑控制器的潜力。 摘要:Building upon prior research that highlighted the need for standardizing environments for building control research, and inspired by recently introduced benchmarks for real life reinforcement learning control, here we propose a non-exhaustive nine real world challenges for reinforcement learning building controller. We argue that building control research should be expressed in this framework in addition to providing a standardized environment for repeatability. Advanced controllers such as model predictive control and reinforcement learning control have both advantages and disadvantages that prevent them from being implemented in real world buildings. Comparisons between the two are seldom, and often biased. By focusing on the benchmark problems and challenges, we can investigate the performance of the controllers under a variety of situations and generate a fair comparison. Lastly, we call for a more interdisciplinary effort of the research community to address the real world challenges, and unlock the potentials of advanced building controllers.

【44】 Extending AdamW by Leveraging Its Second Moment and Magnitude 标题:利用AdamW的第二时刻和量级扩展AdamW 链接:https://arxiv.org/abs/2112.06125

作者:Guoqiang Zhang,Niwa Kenta,W. Bastiaan Kleijn 备注:9 pages 摘要:最近的工作[4]分析了Adam在二次可微函数最优解邻域内的局部收敛性。研究发现,学习率必须足够小,以确保最优解的局部稳定性。上述收敛结果也适用于AdamW。在这项工作中,我们提出了一种新的自适应优化方法,从两个方面对AdamW进行扩展,目的是放宽局部稳定性对小学习率的要求,我们称之为Aida。首先,我们考虑梯度幅值PTH功率的第二矩Ryt。当p=2时,r_t减小到AdamW的v_t。假设{m_t}是AdamW的第一个时刻。据知,亚当(或亚当)的更新方向m{t+1}/(v{t+1}+1}+1}+1}/(或m{t+1}/(v{t+1}+1}+1}+1}+1}/(或亚当)的亚当(或亚当)的更新方向m{{t+t+t+t+1}+1}+1}+1}/(或m{{t+t+t+1}+1}+1}+1}+1}/(或t+1}+1}+1}+1}+1}+1}+1}/(v{t+1}+1}+1}+1}+1}+5+1}+1}+1}+1}+5+1}+1}+1}+5+5+5+5+5+5+5+5+5+5+5+5+5+5+5+5+5+5+5+5+ε)。Aida设计用于以| m{t+1}^q/(r{t+1}+epsilon)^(q/p)(或| m{t+1}^q/((r{t+1})(q/p)+epsilon)的形式计算震级的qth次方,当(p,q)=(2,1)时,其减小为AdamW。假设原点0是二次可微函数的局部最优解。理论上发现,当Aida中q>1和p>1时,原点0仅在权重衰减非零时才是局部稳定的。实验用于解决十个玩具优化问题,并为两个深度学习(DL)任务训练Transformer和Swin Transformer。实证研究表明,在许多场景中(包括两个DL任务),特定设置(p,q)不等于(2,1)的Aida优于AdamW设置(p,q)=(2,1)。 摘要:Recent work [4] analyses the local convergence of Adam in a neighbourhood of an optimal solution for a twice-differentiable function. It is found that the learning rate has to be sufficiently small to ensure local stability of the optimal solution. The above convergence results also hold for AdamW. In this work, we propose a new adaptive optimisation method by extending AdamW in two aspects with the purpose to relax the requirement on small learning rate for local stability, which we refer to as Aida. Firstly, we consider tracking the 2nd moment r_t of the pth power of the gradient-magnitudes. r_t reduces to v_t of AdamW when p=2. Suppose {m_t} is the first moment of AdamW. It is known that the update direction m_{t+1}/(v_{t+1}+epsilon)^0.5 (or m_{t+1}/(v_{t+1}^0.5+epsilon) of AdamW (or Adam) can be decomposed as the sign vector sign(m_{t+1}) multiplied elementwise by a vector of magnitudes |m_{t+1}|/(v_{t+1}+epsilon)^0.5 (or |m_{t+1}|/(v_{t+1}^0.5+epsilon)). Aida is designed to compute the qth power of the magnitude in the form of |m_{t+1}|^q/(r_{t+1}+epsilon)^(q/p) (or |m_{t+1}|^q/((r_{t+1})^(q/p)+epsilon)), which reduces to that of AdamW when (p,q)=(2,1). Suppose the origin 0 is a local optimal solution of a twice-differentiable function. It is found theoretically that when q>1 and p>1 in Aida, the origin 0 is locally stable only when the weight-decay is non-zero. Experiments are conducted for solving ten toy optimisation problems and training Transformer and Swin-Transformer for two deep learning (DL) tasks. The empirical study demonstrates that in a number of scenarios (including the two DL tasks), Aida with particular setups of (p,q) not equal to (2,1) outperforms the setup (p,q)=(2,1) of AdamW.

【45】 Stereoscopic Universal Perturbations across Different Architectures and Datasets 标题:跨不同体系结构和数据集的立体通用扰动 链接:https://arxiv.org/abs/2112.06116

作者:Zachary Berger,Parth Agrawal,Tian Yu Liu,Stefano Soatto,Alex Wong 机构:UCLA Vision Lab 摘要:我们研究了在视差估计任务中,图像对抗性扰动对深度立体匹配网络的影响。我们提出了一种方法来制作一组扰动,当添加到数据集中的任何立体图像对时,这些扰动可以欺骗立体网络显著改变感知的场景几何体。我们的扰动图像是“通用”的,因为它们不仅破坏了对优化数据集上网络的估计,而且还推广到不同数据集上具有不同体系结构的立体网络。我们在多个公共基准数据集上评估了我们的方法,结果表明,我们的扰动会将最先进的立体声网络的D1错误(类似于愚弄率)从1%增加到87%。我们研究扰动对估计场景几何体的影响,并确定最易受攻击的对象类。我们对左右图像之间的注册点激活的分析使我们发现,某些体系结构组件,即可变形卷积和显式匹配,可以提高对抗对手的鲁棒性。我们证明,通过简单地设计具有此类组件的网络,可以将对手的影响降低高达60.5%,这与通过昂贵的对手数据增强进行微调的网络的健壮性相媲美。 摘要:We study the effect of adversarial perturbations of images on deep stereo matching networks for the disparity estimation task. We present a method to craft a single set of perturbations that, when added to any stereo image pair in a dataset, can fool a stereo network to significantly alter the perceived scene geometry. Our perturbation images are "universal" in that they not only corrupt estimates of the network on the dataset they are optimized for, but also generalize to stereo networks with different architectures across different datasets. We evaluate our approach on multiple public benchmark datasets and show that our perturbations can increase D1-error (akin to fooling rate) of state-of-the-art stereo networks from 1% to as much as 87%. We investigate the effect of perturbations on the estimated scene geometry and identify object classes that are most vulnerable. Our analysis on the activations of registered points between left and right images led us to find that certain architectural components, i.e. deformable convolution and explicit matching, can increase robustness against adversaries. We demonstrate that by simply designing networks with such components, one can reduce the effect of adversaries by up to 60.5%, which rivals the robustness of networks fine-tuned with costly adversarial data augmentation.

【46】 Controlled-rearing studies of newborn chicks and deep neural networks 标题:初生雏鸡的控制饲养研究与深度神经网络 链接:https://arxiv.org/abs/2112.06106

作者:Donsuk Lee,Pranav Gujarathi,Justin N. Wood 机构:Department of Informatics, Indiana University, Bloomington, IN , Department of Computer Science, Departments of Informatics, Psychology, Neuroscience 备注:NeurIPS 2021 Workshop on Shared Visual Representations in Human & Machine Intelligence 摘要:卷积神经网络(CNN)现在可以在具有挑战性的目标识别任务上实现人类水平的性能。CNN也是预测视觉识别任务中神经和行为反应的主要定量模型。然而,CNN模型有一个广为接受的批评:与新生动物学习迅速有效不同,CNN被认为是“数据饥饿”,需要大量的训练数据来开发准确的对象识别模型。这种批评挑战了CNN作为视觉发展模型的前景。在这里,我们通过对新生小鸡和CNN进行平行对照饲养实验,直接检验CNN是否比新生动物更需要数据。我们在严格控制的视觉环境中饲养新生小鸡,然后通过在视频游戏引擎中构建虚拟动物室来模拟该环境中可用的训练数据。我们记录了在虚拟室中移动的代理获取的视觉图像,并使用这些图像来训练CNN。当CNN接收到与小鸡相似的视觉训练数据时,CNN成功地解决了与小鸡相同的具有挑战性的视图不变对象识别任务。因此,CNN并不比动物更需要数据:CNN和小鸡都成功地从单个对象的训练数据开发出健壮的对象模型。 摘要:Convolutional neural networks (CNNs) can now achieve human-level performance on challenging object recognition tasks. CNNs are also the leading quantitative models in terms of predicting neural and behavioral responses in visual recognition tasks. However, there is a widely accepted critique of CNN models: unlike newborn animals, which learn rapidly and efficiently, CNNs are thought to be "data hungry," requiring massive amounts of training data to develop accurate models for object recognition. This critique challenges the promise of using CNNs as models of visual development. Here, we directly examined whether CNNs are more data hungry than newborn animals by performing parallel controlled-rearing experiments on newborn chicks and CNNs. We raised newborn chicks in strictly controlled visual environments, then simulated the training data available in that environment by constructing a virtual animal chamber in a video game engine. We recorded the visual images acquired by an agent moving through the virtual chamber and used those images to train CNNs. When CNNs received similar visual training data as chicks, the CNNs successfully solved the same challenging view-invariant object recognition tasks as the chicks. Thus, the CNNs were not more data hungry than animals: both CNNs and chicks successfully developed robust object models from training data of a single object.

【47】 Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection 标题:为历史地图文本检测提供无限训练数据的合成地图生成 链接:https://arxiv.org/abs/2112.06104

作者:Zekun Li,Runyu Guan,Qianmu Yu,Yao-Yi Chiang,Craig A. Knoblock 机构:University of Minnesota, Minneapolis, USA, University of Southern California, Los Angeles, USA 摘要:许多历史地图页可公开用于需要长期历史地理数据的研究。这些地图的制图设计包括地图符号和文字标签的组合。从地图图像中自动读取文本标签可以大大加快地图解释速度,并有助于生成描述地图内容的丰富元数据。已经提出了许多文本检测算法来自动定位地图图像中的文本区域,但大多数算法都是在域外数据集(例如,风景图像)上训练的。训练数据决定了机器学习模型的质量,而在地图图像中手动注释文本区域既费时又费力。另一方面,现有的地理数据源,如开放式街道地图(OSM),包含机器可读的地图层,这使我们能够分离文本层并轻松获得文本标签注释。然而,OSM地图分幅和历史地图之间的制图风格存在显著差异。本文提出了一种自动生成无限量注释历史地图图像的方法,用于训练文本检测模型。我们使用样式转换模型将当代地图图像转换为历史样式,并在其上放置文本标签。我们表明,最先进的文本检测模型(如PSENet)可以从合成历史地图中获益,并实现历史地图文本检测的显著改进。 摘要:Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-ofdomain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open- StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection.

【48】 Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts 标题:基于单语文本的神经机器翻译领域内并行句子选择 链接:https://arxiv.org/abs/2112.06096

作者:Javad Pourmostafa Roshan Sharami,Dimitar Shterionov,Pieter Spronck 机构:∗Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands 备注:Accepted to the CLIN Journal on Dec 6, 2021 摘要:不断增长的数据量导致更大的通用模型。特定用例通常被忽略,因为通用模型在特定领域的用例中往往表现不佳。我们的工作通过一种从通用领域(平行文本)语料库中选择领域内数据的方法来解决这一差距,用于机器翻译任务。该方法根据句子与单语领域特定数据集的余弦相似性,对并行领域数据中的句子进行排序。然后,我们选择相似度最高的前K个句子来训练一个新的机器翻译系统,该系统针对特定的领域内数据进行调整。我们的实验结果表明,基于域内数据训练的模型优于基于泛型或泛型与域数据混合训练的模型。也就是说,我们的方法以较低的计算成本和数据量选择高质量的领域特定训练实例。 摘要:Continuously-growing data volumes lead to larger generic models. Specific use-cases are usually left out, since generic models tend to perform poorly in domain-specific cases. Our work addresses this gap with a method for selecting in-domain data from generic-domain (parallel text) corpora, for the task of machine translation. The proposed method ranks sentences in parallel general-domain data according to their cosine similarity with a monolingual domain-specific data set. We then select the top K sentences with the highest similarity score to train a new machine translation system tuned to the specific in-domain data. Our experimental results show that models trained on this in-domain data outperform models trained on generic or a mixture of generic and domain data. That is, our method selects high-quality domain-specific training instances at low computational cost and data size.

【49】 Convergence of Generalized Belief Propagation Algorithm on Graphs with Motifs 标题:有模图上广义置信传播算法的收敛性 链接:https://arxiv.org/abs/2112.06087

作者:Yitao Chen,Deepanshu Vasal 机构:Qualcomm, USA, Northwestern University 备注:10 pages 2 figures 摘要:信念传播是机器学习中许多应用的基本消息传递算法。众所周知,信任传播算法在树图上是精确的。然而,在大多数应用中,信念传播是在循环图上运行的。因此,理解循环图上信念传播的行为已经成为不同领域研究人员的主要课题。本文研究了模体图(三角形、环等)上广义信念传播算法的收敛性,证明了在一定的初始化条件下,模体图上铁磁Ising模型的广义信念传播收敛到Bethe自由能的全局最优解。 摘要:Belief propagation is a fundamental message-passing algorithm for numerous applications in machine learning. It is known that belief propagation algorithm is exact on tree graphs. However, belief propagation is run on loopy graphs in most applications. So, understanding the behavior of belief propagation on loopy graphs has been a major topic for researchers in different areas. In this paper, we study the convergence behavior of generalized belief propagation algorithm on graphs with motifs (triangles, loops, etc.) We show under a certain initialization, generalized belief propagation converges to the global optimum of the Bethe free energy for ferromagnetic Ising models on graphs with motifs.

【50】 UPV at TREC Health Misinformation Track 2021 Ranking with SBERT and Quality Estimators 标题:TREC 2021年健康错误信息轨道上的UPV与SBERT和Quality Estimators一起排名 链接:https://arxiv.org/abs/2112.06080

作者:Ipek Baris Schlicht,Angel Felipe Magnossão de Paula,Paolo Rosso 机构:Universitat Politècnica de València, Spain 备注:6 pages; presented at the TREC 2021 摘要:搜索引擎上的健康错误信息是一个可能对个人或公共健康产生负面影响的重大问题。为了缓解这个问题,TREC组织了一个健康错误信息跟踪。本文介绍了我们对这条赛道的意见。我们使用BM25和特定领域的语义搜索引擎来检索初始文档。随后,我们检查了一个用于质量评估的健康新闻模式,并将其应用于对文档重新排序。我们使用倒数秩融合将来自不同分量的分数合并。最后,我们讨论了结果,并对未来的工作进行了总结。 摘要:Health misinformation on search engines is a significant problem that could negatively affect individuals or public health. To mitigate the problem, TREC organizes a health misinformation track. This paper presents our submissions to this track. We use a BM25 and a domain-specific semantic search engine for retrieving initial documents. Later, we examine a health news schema for quality assessment and apply it to re-rank documents. We merge the scores from the different components by using reciprocal rank fusion. Finally, we discuss the results and conclude with future works.

【51】 MedAttacker: Exploring Black-Box Adversarial Attacks on Risk Prediction Models in Healthcare 标题:MedAttacker:探索医疗保健领域风险预测模型的黑箱对抗性攻击 链接:https://arxiv.org/abs/2112.06063

作者:Muchao Ye,Junyu Luo,Guanjie Zheng,Cao Xiao,Ting Wang,Fenglong Ma 机构:The Pennsylvania State University, edu)†Shanghai Jiao Tong University 摘要:深度神经网络(DNN)已被广泛应用于健康风险预测,以提供医疗诊断和治疗。为了评估其稳健性,现有研究在可访问模型参数的白/灰盒环境中进行对抗性攻击。然而,一个更现实的黑箱对抗性攻击被忽略了,即使大多数真实世界的模型都是使用私有数据进行训练并作为黑箱服务在云端发布的。为了填补这一空白,我们提出了第一种针对健康风险预测模型的黑盒对抗性攻击方法MedAttacker,以调查其漏洞。MedAttacker通过两个步骤解决EHR数据带来的挑战:在强化学习(RL)框架中选择受攻击位置的分层位置选择和使用基于分数的原则识别替代品的替代品选择。特别地,通过考虑EHR内部的时间上下文,它通过使用每次访问的贡献分数和每个代码的显著性分数来初始化其RL位置选择策略,这可以很好地与分数变化决定的确定性替代选择过程相结合。在实验中,当在多个真实数据集的黑盒环境中攻击三个高级健康风险预测模型时,MedAttacker始终获得最高的平均成功率,在某些情况下甚至优于最近的白盒EHR对抗性攻击技术。此外,根据实验结果,我们还讨论了如何防御EHR对抗性攻击。 摘要:Deep neural networks (DNNs) have been broadly adopted in health risk prediction to provide healthcare diagnoses and treatments. To evaluate their robustness, existing research conducts adversarial attacks in the white/gray-box setting where model parameters are accessible. However, a more realistic black-box adversarial attack is ignored even though most real-world models are trained with private data and released as black-box services on the cloud. To fill this gap, we propose the first black-box adversarial attack method against health risk prediction models named MedAttacker to investigate their vulnerability. MedAttacker addresses the challenges brought by EHR data via two steps: hierarchical position selection which selects the attacked positions in a reinforcement learning (RL) framework and substitute selection which identifies substitute with a score-based principle. Particularly, by considering the temporal context inside EHRs, it initializes its RL position selection policy by using the contribution score of each visit and the saliency score of each code, which can be well integrated with the deterministic substitute selection process decided by the score changes. In experiments, MedAttacker consistently achieves the highest average success rate and even outperforms a recent white-box EHR adversarial attack technique in certain cases when attacking three advanced health risk prediction models in the black-box setting across multiple real-world datasets. In addition, based on the experiment results we include a discussion on defending EHR adversarial attacks.

【52】 Towards Autonomous Satellite Communications: An AI-based Framework to Address System-level Challenges 标题:走向自主卫星通信:基于人工智能的应对系统级挑战的框架 链接:https://arxiv.org/abs/2112.06055

作者:Juan Jose Garau-Luis,Skylar Eiskowitz,Nils Pachler,Edward Crawley,Bruce Cameron 机构:Engineering Systems Laboratory, Massachusetts Institute of Technology 备注:AAAI Workshop on AI to Accelerate Science and Engineering, at AAAI Conference 2022 摘要:下一代卫星星座旨在更好地满足我们互联社会的未来需求:高度可变的数据需求、移动连接,以及到达更多服务不足的地区。鉴于当前资源分配机制的可扩展性差、反应速度慢,人工智能(AI)和基于学习的方法有望成为该行业的关键参与者。虽然人工智能框架已经针对孤立的通信任务或子问题进行了验证,但仍然没有实现完全自主卫星系统的明确途径。这个问题的部分原因是在设计模型时关注子问题,而不是必要的系统级透视图。在本文中,我们试图通过描述提高卫星自主性必须满足的系统级需求来弥合这一差距,并引入三个基于AI的组件(需求估计器、离线规划器和实时引擎),共同解决这些问题。我们首先对不同的子问题进行广泛的文献回顾,并确定系统级目标缺失的环节。针对这些差距,我们概述了三个必要的组成部分,并强调了它们之间的相互作用。我们还将讨论如何将当前模型纳入框架以及未来工作的可能方向。 摘要:The next generation of satellite constellations is designed to better address the future needs of our connected society: highly-variable data demand, mobile connectivity, and reaching more under-served regions. Artificial Intelligence (AI) and learning-based methods are expected to become key players in the industry, given the poor scalability and slow reaction time of current resource allocation mechanisms. While AI frameworks have been validated for isolated communication tasks or subproblems, there is still not a clear path to achieve fully-autonomous satellite systems. Part of this issue results from the focus on subproblems when designing models, instead of the necessary system-level perspective. In this paper we try to bridge this gap by characterizing the system-level needs that must be met to increase satellite autonomy, and introduce three AI-based components (Demand Estimator, Offline Planner, and Real Time Engine) that jointly address them. We first do a broad literature review on the different subproblems and identify the missing links to the system-level goals. In response to these gaps, we outline the three necessary components and highlight their interactions. We also discuss how current models can be incorporated into the framework and possible directions of future work.

【53】 Retrosynthetic Planning with Experience-Guided Monte Carlo Tree Search 标题:基于经验引导蒙特卡罗树搜索的逆向综合规划 链接:https://arxiv.org/abs/2112.06028

作者:Siqi Hong,Hankz Hankui Zhuo,Kebing Jin,Zhanwen Zhou 机构:School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, China 摘要:逆合成规划问题是分析一个复杂的分子,并用简单的积木给出一条合成路线。大量的化学反应导致各种可能性的组合爆炸,即使是经验丰富的化学家也无法选择最有希望的转化。目前的方法依赖于人类定义的或机器训练的分数函数,这些分数函数具有有限的化学知识,或者使用昂贵的估计方法(如Rollow)来指导搜索。在本文中,我们提出{\tt-MCTS},一种新的基于MCTS的逆向综合规划方法来处理逆向综合规划问题。我们建立了一个经验指导网络,在搜索过程中从合成经验中学习知识,而不是利用卷展。在USPTO基准数据集上的实验表明,我们的{\tt MCTS}在效率和有效性方面都比最先进的方法有了显著的改进。 摘要:Retrosynthetic planning problem is to analyze a complex molecule and give a synthetic route using simple building blocks. The huge number of chemical reactions leads to a combinatorial explosion of possibilities, and even the experienced chemists could not select the most promising transformations. The current approaches rely on human-defined or machine-trained score functions which have limited chemical knowledge or use expensive estimation methods such as rollout to guide the search. In this paper, we propose {\tt MCTS}, a novel MCTS-based retrosynthetic planning approach, to deal with retrosynthetic planning problem. Instead of exploiting rollout, we build an Experience Guidance Network to learn knowledge from synthetic experiences during the search. Experiments on benchmark USPTO datasets show that, our {\tt MCTS} gains significant improvement over state-of-the-art approaches both in efficiency and effectiveness.

【54】 Curvature-guided dynamic scale networks for Multi-view Stereo 标题:曲率引导的多视点立体动态比例尺网络 链接:https://arxiv.org/abs/2112.05999

作者:Khang Truong Giang,Soohwan Song,Sungho Jo 摘要:多视点立体(MVS)是精确三维重建的关键任务。最新的研究试图通过设计聚合的3D成本量及其正则化来提高MVS中匹配成本量的性能。本文主要研究学习一个鲁棒的特征提取网络,以提高匹配性能,而无需在其他步骤中进行大量计算。特别地,我们提出了一种动态尺度特征提取网络,即CDSFNet。它由多个新的卷积层组成,每个卷积层可以根据图像表面的法曲率为每个像素选择合适的面片比例。因此,CDFSNet可以估计最佳的面片尺度来学习鉴别特征,以便在参考图像和源图像之间进行精确的匹配计算。通过将稳健的提取特征与适当的成本公式策略相结合,我们得到的MVS体系结构可以更精确地估计深度图。大量实验表明,在复杂的室外场景中,该方法的性能优于其他先进的方法。它显著提高了重建模型的完整性。因此,与其他MVS方法相比,该方法可以在更快的运行时间和更低的内存内处理更高分辨率的输入。我们的源代码可以在url上找到{https://github.com/TruongKhang/cds-mvsnet}. 摘要:Multi-view stereo (MVS) is a crucial task for precise 3D reconstruction. Most recent studies tried to improve the performance of matching cost volume in MVS by designing aggregated 3D cost volumes and their regularization. This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation in the other steps. In particular, we present a dynamic scale feature extraction network, namely, CDSFNet. It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface. As a result, CDFSNet can estimate the optimal patch scales to learn discriminative features for accurate matching computation between reference and source images. By combining the robust extracted features with an appropriate cost formulation strategy, our resulting MVS architecture can estimate depth maps more precisely. Extensive experiments showed that the proposed method outperforms other state-of-the-art methods on complex outdoor scenes. It significantly improves the completeness of reconstructed models. As a result, the method can process higher resolution inputs within faster run-time and lower memory than other MVS methods. Our source code is available at url{https://github.com/TruongKhang/cds-mvsnet}.

【55】 Formalising the Foundations of Discrete Reinforcement Learning in Isabelle/HOL 标题:在Isabelle/HOL中形式化离散强化学习的基础 链接:https://arxiv.org/abs/2112.05996

作者:Mark Chevallier,Jacques Fleuriot 机构:ukArtificialIntelligenceanditsApplicationsInstitute(AIAI), SchoolofInformatics, University of Edinburgh 摘要:在Isabelle定理证明器中,我们给出了带报酬的有限Markov决策过程的形式化。我们关注动态规划所需的基础,以及在这些过程中使用强化学习代理。特别是,我们从第一性原理(标量和向量形式)推导了Bellman方程,推导了产生任何政策p的期望值的向量计算,并进一步证明了贴现因子小于1的普遍最优政策的存在性。最后,我们证明了值迭代和策略迭代算法在有限时间内工作,分别产生一个epsilon最优和一个完全最优策略。 摘要:We present a formalisation of finite Markov decision processes with rewards in the Isabelle theorem prover. We focus on the foundations required for dynamic programming and the use of reinforcement learning agents over such processes. In particular, we derive the Bellman equation from first principles (in both scalar and vector form), derive a vector calculation that produces the expected value of any policy p, and go on to prove the existence of a universally optimal policy where there is a discounting factor less than one. Lastly, we prove that the value iteration and the policy iteration algorithms work in finite time, producing an epsilon-optimal and a fully optimal policy respectively.

【56】 Overview of The MediaEval 2021 Predicting Media Memorability Task 标题:中世纪2021年预测媒体记忆性任务概述 链接:https://arxiv.org/abs/2112.05982

作者:Rukiye Savran Kiziltepe,Mihai Gabriel Constantin,Claire-Helene Demarty,Graham Healy,Camilo Fosco,Alba Garcia Seco de Herrera,Sebastian Halder,Bogdan Ionescu,Ana Matran-Fernandez,Alan F. Smeaton,Lorin Sweeney 机构:University of Essex, UK, University Politehnica of Bucharest, Romania, InterDigital, France, Dublin City University, Ireland, Massachusetts Institute of Technology Cambridge, Massachusetts, USA. 备注:3 pages, to appear in Proceedings of MediaEval 2021, December 13-15 2021, Online 摘要:本文描述了中世纪2021预测媒体记忆性任务,这是在其第四版今年,作为预测短期和长期的视频记忆仍然是一项具有挑战性的任务。在2021,使用两个视频数据集:第一,TCREVID 2019视频到文本数据集的子集;其次是Memento10K数据集,以提供探索跨数据集概括的机会。此外,还介绍了一种基于脑电图(EEG)的预测导频子任务。在本文中,我们概述了任务的主要方面,并描述了数据集、评估指标和参与者提交的要求。 摘要:This paper describes the MediaEval 2021 Predicting Media Memorability}task, which is in its 4th edition this year, as the prediction of short-term and long-term video memorability remains a challenging task. In 2021, two datasets of videos are used: first, a subset of the TRECVid 2019 Video-to-Text dataset; second, the Memento10K dataset in order to provide opportunities to explore cross-dataset generalisation. In addition, an Electroencephalography (EEG)-based prediction pilot subtask is introduced. In this paper, we outline the main aspects of the task and describe the datasets, evaluation metrics, and requirements for participants' submissions.

【57】 Server-Side Local Gradient Averaging and Learning Rate Acceleration for Scalable Split Learning 标题:可伸缩分裂学习的服务器端局部梯度平均和学习速率加速 链接:https://arxiv.org/abs/2112.05929

作者:Shraman Pal,Mansi Uniyal,Jihong Park,Praneeth Vepakomma,Ramesh Raskar,Mehdi Bennis,Moongu Jeon,Jinho Choi 机构:IIT Kharagpur, India,Deakin University, Australia,MIT Media Lab, USA,GIST, Korea,University of Oulu, Finland, ∗Equal contribution 备注:9 pages, 3 figures, 6 tables 摘要:近年来,在使用私有数据的分散学习领域取得了巨大进展。联合学习(FL)和分割学习(SL)是两个有其优点和缺点的先锋,分别适用于许多用户客户端和大型模型。为了享受这两种好处,SplitFed等混合方法最近才出现,但它们的基本原理仍然是虚幻的。在这项工作中,我们首先确定了SL的基本瓶颈,并由此提出了一个可扩展的SL框架,即SGLR。SGLR下的服务器在剥离层上广播一个平均的公共梯度,模拟FL,而不需要跨客户端进行任何额外的通信,而不是SplitFed。同时,SGLR将学习速率拆分为服务器端速率和客户端速率,并分别调整它们以支持多个并行客户端。仿真结果证实,SGLR比SPLITFE的其他基线SL方法具有更高的精度,甚至与FL消耗更高的能量和通信成本相一致。作为第二个结果,我们观察到在基线上使用SLGR通过互信息更大程度地减少了敏感信息的泄漏。 摘要:In recent years, there have been great advances in the field of decentralized learning with private data. Federated learning (FL) and split learning (SL) are two spearheads possessing their pros and cons, and are suited for many user clients and large models, respectively. To enjoy both benefits, hybrid approaches such as SplitFed have emerged of late, yet their fundamentals have still been illusive. In this work, we first identify the fundamental bottlenecks of SL, and thereby propose a scalable SL framework, coined SGLR. The server under SGLR broadcasts a common gradient averaged at the split-layer, emulating FL without any additional communication across clients as opposed to SplitFed. Meanwhile, SGLR splits the learning rate into its server-side and client-side rates, and separately adjusts them to support many clients in parallel. Simulation results corroborate that SGLR achieves higher accuracy than other baseline SL methods including SplitFed, which is even on par with FL consuming higher energy and communication costs. As a secondary result, we observe greater reduction in leakage of sensitive information via mutual information using SLGR over the baselines.

【58】 Efficient Device Scheduling with Multi-Job Federated Learning 标题:基于多作业联合学习的高效设备调度 链接:https://arxiv.org/abs/2112.05928

作者:Chendi Zhou,Ji Liu,Juncheng Jia,Jingbo Zhou,Yang Zhou,Huaiyu Dai,Dejing Dou 机构:Soochow University,Baidu Inc., China, Auburn University,North Carolina State University, United States 备注:14 pages, 7 figures, 6 tables 摘要:近年来,在终端用户的多个(边缘)设备中出现了大量分散数据,而由于法律或法规的原因,分散数据的聚合对于机器学习工作来说仍然很困难。联邦学习(FL)是一种在不共享敏感原始数据的情况下处理分散数据,同时协作训练全局机器学习模型的有效方法。FL中的服务器需要在训练过程中选择(并安排)设备。然而,具有FL的多个作业的设备调度仍然是一个关键和开放的问题。在本文中,我们提出了一个新的多任务FL框架,以支持多任务的并行训练过程。该框架由一个系统模型和两种调度方法组成。在系统模型中,我们提出了多个作业的并行训练过程,并基于不同作业训练过程中各种设备的训练时间和数据公平性构建了成本模型。我们提出了一种基于强化学习的方法和一种基于贝叶斯优化的方法来调度多个任务的设备,同时最小化成本。我们对多个作业和数据集进行了广泛的实验。实验结果表明,我们提出的方法在训练时间(快8.67倍)和准确度(高44.6%)方面明显优于基线方法。 摘要:Recent years have witnessed a large amount of decentralized data in multiple (edge) devices of end-users, while the aggregation of the decentralized data remains difficult for machine learning jobs due to laws or regulations. Federated Learning (FL) emerges as an effective approach to handling decentralized data without sharing the sensitive raw data, while collaboratively training global machine learning models. The servers in FL need to select (and schedule) devices during the training process. However, the scheduling of devices for multiple jobs with FL remains a critical and open problem. In this paper, we propose a novel multi-job FL framework to enable the parallel training process of multiple jobs. The framework consists of a system model and two scheduling methods. In the system model, we propose a parallel training process of multiple jobs, and construct a cost model based on the training time and the data fairness of various devices during the training process of diverse jobs. We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost. We conduct extensive experimentation with multiple jobs and datasets. The experimental results show that our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher).

【59】 ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning 标题:ElegantRL-Podracer:云本地深度强化学习的可伸缩弹性库 链接:https://arxiv.org/abs/2112.05923

作者:Xiao-Yang Liu,Zechu Li,Zhuoran Yang,Jiahao Zheng,Zhaoran Wang,Anwar Walid,Jian Guo,Michael I. Jordan 机构:Columbia University; ,University of California, Berkeley; ,Shenzhen Inst. of Advanced Tech.;, Northwestern University; ,Amazon & Columbia University; ,IDEA Research. 备注:None 摘要:深度强化学习(DRL)已经彻底改变了游戏和机器人控制等应用中的学习和驱动。数据收集的成本,即从代理环境交互生成转换,仍然是复杂现实世界问题中更广泛采用DRL的主要挑战。遵循云本地模式在GPU云平台上训练DRL代理是一个很有前途的解决方案。在本文中,我们提出了一个可扩展的弹性库ElegantRL podracer,用于云本机深度强化学习,它有效地支持数百万个GPU内核在多个级别上执行大规模并行训练。在高层,ElegantRL podracer采用基于锦标赛的集成方案,在数百甚至数千个GPU上协调训练过程,安排排行榜和数百个Pod的训练池之间的交互。在低级别上,每个pod通过在单个GPU中充分利用近7000个GPU CUDA核来并行模拟agent环境交互。我们的ElegantRL podracer库遵循集装箱化、微服务和MLOP的开发原则,具有高可扩展性、弹性和可访问性。使用NVIDIA DGX SuperPOD云,我们对移动和股票交易中的各种任务进行了广泛的实验,结果表明,ElegantRL podracer的性能明显优于RLlib。我们的代码可以在GitHub上找到。 摘要:Deep reinforcement learning (DRL) has revolutionized learning and actuation in applications such as game playing and robotic control. The cost of data collection, i.e., generating transitions from agent-environment interactions, remains a major challenge for wider DRL adoption in complex real-world problems. Following a cloud-native paradigm to train DRL agents on a GPU cloud platform is a promising solution. In this paper, we present a scalable and elastic library ElegantRL-podracer for cloud-native deep reinforcement learning, which efficiently supports millions of GPU cores to carry out massively parallel training at multiple levels. At a high-level, ElegantRL-podracer employs a tournament-based ensemble scheme to orchestrate the training process on hundreds or even thousands of GPUs, scheduling the interactions between a leaderboard and a training pool with hundreds of pods. At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU CUDA cores in a single GPU. Our ElegantRL-podracer library features high scalability, elasticity and accessibility by following the development principles of containerization, microservices and MLOps. Using an NVIDIA DGX SuperPOD cloud, we conduct extensive experiments on various tasks in locomotion and stock trading and show that ElegantRL-podracer substantially outperforms RLlib. Our codes are available on GitHub.

【60】 Neural Attention Models in Deep Learning: Survey and Taxonomy 标题:深度学习中的神经注意模型:综述与分类学 链接:https://arxiv.org/abs/2112.05909

作者:Alana Santana,Esther Colombini 机构: acting from the perception ofLaboratory of Robotics and Cogntive Systems (LaRoCS) Institute ofComputing, University of Campinas 摘要:注意是一种唤醒状态,能够通过选择性地关注一条信息而忽略其他可感知信息来处理人类有限的加工瓶颈。几十年来,注意力的概念和功能一直在哲学、心理学、神经科学和计算机领域进行研究。目前,这一特性在深度神经网络中得到了广泛的研究。现在有许多不同的神经注意模型可用,并且在过去六年中一直是一个非常活跃的研究领域。从注意的理论观点来看,这项调查对主要的神经注意模型进行了批判性分析。在这里,我们提出了一个分类法,它与深度学习之前的理论方面相一致。我们的分类法提供了一个组织结构,提出了新的问题,并构建了对现有注意机制的理解。特别是,从心理学和神经科学经典研究中得出的17个标准被制定出来,用于对650多篇分析论文中发现的51个主要模型进行定性比较和批判性分析。此外,我们还强调了一些尚未探讨的理论问题,包括关于生物学合理性的讨论,强调了当前的研究趋势,并为未来提供了见解。 摘要:Attention is a state of arousal capable of dealing with limited processing bottlenecks in human beings by focusing selectively on one piece of information while ignoring other perceptible information. For decades, concepts and functions of attention have been studied in philosophy, psychology, neuroscience, and computing. Currently, this property has been widely explored in deep neural networks. Many different neural attention models are now available and have been a very active research area over the past six years. From the theoretical standpoint of attention, this survey provides a critical analysis of major neural attention models. Here we propose a taxonomy that corroborates with theoretical aspects that predate Deep Learning. Our taxonomy provides an organizational structure that asks new questions and structures the understanding of existing attentional mechanisms. In particular, 17 criteria derived from psychology and neuroscience classic studies are formulated for qualitative comparison and critical analysis on the 51 main models found on a set of more than 650 papers analyzed. Also, we highlight several theoretical issues that have not yet been explored, including discussions about biological plausibility, highlight current research trends, and provide insights for the future.

【61】 Deep Q-Network with Proximal Iteration 标题:具有近似式迭代的深度Q-网络 链接:https://arxiv.org/abs/2112.05848

作者:Kavosh Asadi,Rasool Fakoor,Omer Gottesman,Michael L. Littman,Alexander J. Smola 备注:Work in Progress 摘要:在强化学习中,我们采用近似迭代法优化值函数。近似迭代是一种计算效率高的技术,它使我们能够将优化过程偏向于更理想的解决方案。作为近端迭代在深度强化学习中的具体应用,我们赋予深度Q网络(DQN)代理的目标函数一个近端项,以确保DQN的在线网络组件保持在目标网络附近。最终的代理,我们称之为DQN与近端迭代,或DQNPro,在Atari基准上比原始的DQN有显著的改进。我们的结果强调了采用声音优化技术进行深度强化学习的能力。 摘要:We employ Proximal Iteration for value-function optimization in reinforcement learning. Proximal Iteration is a computationally efficient technique that enables us to bias the optimization procedure towards more desirable solutions. As a concrete application of Proximal Iteration in deep reinforcement learning, we endow the objective function of the Deep Q-Network (DQN) agent with a proximal term to ensure that the online-network component of DQN remains in the vicinity of the target network. The resultant agent, which we call DQN with Proximal Iteration, or DQNPro, exhibits significant improvements over the original DQN on the Atari benchmark. Our results accentuate the power of employing sound optimization techniques for deep reinforcement learning.

【62】 A Novel Gaussian Process Based Ground Segmentation Algorithm with Local-Smoothness Estimation 标题:一种新的基于高斯过程的局部光滑度估计地面分割算法 链接:https://arxiv.org/abs/2112.05847

作者:Pouria Mehrabi,Hamid D. Taghirad 机构: Toosi University of Technology 备注:arXiv admin note: substantial text overlap with arXiv:2111.10638 摘要:自动陆地车辆(ALV)应能在未知环境中有效识别地面。提出了一种基于$\mathcal{GP}$的粗糙驾驶场景下的地面分割方法。非平稳协方差函数用作$\mathcal{GP}$的核。假定地面行为仅显示局部平滑度。这样,就得到了核长度尺度的点估计。因此,引入了两个高斯过程来分别模拟数据的观测和局部特征。当使用\textit{observation process}对地面建模时,将\textit{潜伏过程}放在长度刻度值上,以估计每个输入位置的长度刻度点值。这一潜在过程的输入位置是在物理激励程序中选择的,以表示对地面条件的直觉。此外,通过假设环境中存在假设曲面,可以表示长度刻度值的直观猜测,假设每一组数据点都是由该曲面的测量结果产生的。贝叶斯推理是使用\text{maximum a Posteriori}标准实现的。假定对数边际似然函数是一个多任务目标函数,以表示每一帧地面的整个帧无偏视图。仿真结果表明,即使在不均匀、粗糙的场景中,该方法的效果也优于基于相似高斯过程的地面分割方法。在不均匀场景中,相邻线段没有相似的地面结构,该方法基于全帧视点进行有效的地面估计,而不是仅估计分段可能的地面。 摘要:Autonomous Land Vehicles (ALV) shall efficiently recognize the ground in unknown environments. A novel $\mathcal{GP}$-based method is proposed for the ground segmentation task in rough driving scenarios. A non-stationary covariance function is utilized as the kernel for the $\mathcal{GP}$. The ground surface behavior is assumed to only demonstrate local-smoothness. Thus, point estimates of the kernel's length-scales are obtained. Thus, two Gaussian processes are introduced to separately model the observation and local characteristics of the data. While, the \textit{observation process} is used to model the ground, the \textit{latent process} is put on length-scale values to estimate point values of length-scales at each input location. Input locations for this latent process are chosen in a physically-motivated procedure to represent an intuition about ground condition. Furthermore, an intuitive guess of length-scale value is represented by assuming the existence of hypothetical surfaces in the environment that every bunch of data points may be assumed to be resulted from measurements from this surfaces. Bayesian inference is implemented using \textit{maximum a Posteriori} criterion. The log-marginal likelihood function is assumed to be a multi-task objective function, to represent a whole-frame unbiased view of the ground at each frame. Simulation results shows the effectiveness of the proposed method even in an uneven, rough scene which outperforms similar Gaussian process based ground segmentation methods. While adjacent segments do not have similar ground structure in an uneven scene, the proposed method gives an efficient ground estimation based on a whole-frame viewpoint instead of just estimating segment-wise probable ground surfaces.

【63】 Logical Boltzmann Machines 标题:逻辑玻尔兹曼机 链接:https://arxiv.org/abs/2112.05841

作者:Son N. Tran,Artur d'Avila Garcez 机构: Tran 1 and Artur d’Avila Garcez 2 1University of Tasmania (sn, University of London (a 备注:15 pages, 5 figures, 2 tables 摘要:在连接主义系统中表示符号知识的想法是一项长期的努力,最近引起了广泛关注,其目标是将机器学习与可伸缩的声音推理相结合。早期的研究表明,命题逻辑和对称神经网络之间存在对应关系,但对称神经网络不能很好地随变量数量扩展,其训练机制效率低下。本文介绍了逻辑玻尔兹曼机(LBM),它是一种神经符号系统,可以用严格析取范式表示任何命题逻辑公式。我们证明了LBM中能量最小化和逻辑可满足性之间的等价性,从而表明LBM具有良好的推理能力。我们对推理进行了经验评估,以表明LBM能够通过搜索少于0.75%的可能(约10亿)赋值来找到一类逻辑公式的所有满意赋值。我们将LBM中的学习与符号归纳逻辑编程系统、最先进的神经符号系统和纯粹基于神经网络的系统进行比较,在七个数据集中的五个数据集中实现了更好的学习性能。 摘要:The idea of representing symbolic knowledge in connectionist systems has been a long-standing endeavour which has attracted much attention recently with the objective of combining machine learning and scalable sound reasoning. Early work has shown a correspondence between propositional logic and symmetrical neural networks which nevertheless did not scale well with the number of variables and whose training regime was inefficient. In this paper, we introduce Logical Boltzmann Machines (LBM), a neurosymbolic system that can represent any propositional logic formula in strict disjunctive normal form. We prove equivalence between energy minimization in LBM and logical satisfiability thus showing that LBM is capable of sound reasoning. We evaluate reasoning empirically to show that LBM is capable of finding all satisfying assignments of a class of logical formulae by searching fewer than 0.75% of the possible (approximately 1 billion) assignments. We compare learning in LBM with a symbolic inductive logic programming system, a state-of-the-art neurosymbolic system and a purely neural network-based system, achieving better learning performance in five out of seven data sets.

【64】 Sequence-level self-learning with multiple hypotheses 标题:具有多个假设的序列级自学习 链接:https://arxiv.org/abs/2112.05826

作者:Kenichi Kumatani,Dimitrios Dimitriadis,Yashesh Gaur,Robert Gmyr,Sefik Emre Eskimez,Jinyu Li,Michael Zeng 机构:Microsoft, WA, USA 备注:Published in Interspeech 2020: this https URL 摘要:在这项工作中,我们开发了一种新的基于注意的自动语音识别(ASR)序列对序列(seq2seq)模型的自学习技术。对于未翻译的语音数据,来自ASR系统的假设必须用作标签。然而,不完善的ASR结果使得无监督学习难以持续提高识别性能,特别是在多个强大的教师模型不可用的情况下。与传统的无监督学习方法相比,我们采用了多任务学习(MTL)框架,其中第n个最佳ASR假设被用作每个任务的标签。seq2seq网络通过MTL框架进行更新,以便找到能够覆盖多个假设的通用表示。通过这样做,可以减轻\emph{hard decision}错误的影响。我们首先通过在美国和英国英语语音之间的口音适应任务中的ASR实验来证明我们的自学习方法的有效性。我们的实验结果表明,与仅使用美国英语数据训练的基线模型相比,我们的方法可以将英国语音数据的WER从14.55%降低到10.36%。此外,我们还研究了我们提出的方法在联邦学习场景中的效果。 摘要:In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the \emph{multi-task learning} (MTL) framework where the $n$-th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the \emph{hard-decision} errors can be alleviated. We first demonstrate the effectiveness of our self-learning methods through ASR experiments in an accent adaptation task between the US and British English speech. Our experiment results show that our method can reduce the WER on the British speech data from 14.55\% to 10.36\% compared to the baseline model trained with the US English data only. Moreover, we investigate the effect of our proposed methods in a federated learning scenario.

【65】 Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition 标题:构建稀疏混合专家语音识别多语种教师队伍 链接:https://arxiv.org/abs/2112.05820

作者:Kenichi Kumatani,Robert Gmyr,Felipe Cruz Salinas,Linquan Liu,Wei Zuo,Devang Patel,Eric Sun,Yu Shi 机构:Microsoft 摘要:稀疏门控混合专家(MoE)可以在计算复杂度较低的情况下放大网络容量。在这项工作中,我们研究了如何通过简单的路由算法来扩展多语言自动语音识别(ASR)网络,以获得更好的准确度。更具体地说,我们将稀疏选通MoE技术应用于两种类型的网络:顺序-顺序Transformer(S2S-T)和Transformer-传感器(T-T)。我们通过一组多语言数据的ASR实验证明,使用S2S-T和T-T,MoE网络可以分别将相对单词错误率降低16.5%和4.7%。此外,我们还深入研究了MoE在不同条件下对T-T体系结构的影响:流模式、非流模式、使用语言ID以及使用MoE的标签解码器。 摘要:The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity. In this work, we investigate how multi-lingual Automatic Speech Recognition (ASR) networks can be scaled up with a simple routing algorithm in order to achieve better accuracy. More specifically, we apply the sparsely-gated MoE technique to two types of networks: Sequence-to-Sequence Transformer (S2S-T) and Transformer Transducer (T-T). We demonstrate through a set of ASR experiments on multiple language data that the MoE networks can reduce the relative word error rates by 16.5\% and 4.7\% with the S2S-T and T-T, respectively. Moreover, we thoroughly investigate the effect of the MoE on the T-T architecture in various conditions: streaming mode, non-streaming mode, the use of language ID and the label decoder with the MoE.

【66】 Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets 标题:自然场景中人类视觉搜索计算模型的基准比较:模型比较和参考数据集 链接:https://arxiv.org/abs/2112.05808

作者:F. Travi,G. Ruarte,G. Bujia,J. E. Kamienkowski 机构:Universidad de Buenos Aires - CONICET 摘要:视觉搜索几乎是任何日常人类与环境的目标导向交互的重要组成部分。目前,有几种算法能够预测简单观察时的注视位置,但很少有模型试图模拟自然场景中视觉搜索时的人类行为。此外,这些模型在设计上差异很大,在评估它们时所用的数据集和指标上也存在差异。因此,需要一个参考点,在该点上可以测试每个模型,并从中得出潜在的改进。在目前的工作中,我们在自然场景中选择公开可用的最先进的视觉搜索模型,并在不同的数据集上对其进行评估,使用相同的度量来估计其效率和与人类受试者的相似性。特别是,我们通过结合基于神经网络的可视化搜索模型,对理想的贝叶斯搜索器进行了改进,使其能够推广到其他数据集。目前的工作揭示了当前模型的局限性,以及如何通过组合方法实现潜在的改进。此外,它还提供了一个解决方案,以满足对基准数据和度量的迫切需要,从而支持开发更通用的人类视觉搜索计算模型。 摘要:Visual search is an essential part of almost any everyday human goal-directed interaction with the environment. Nowadays, several algorithms are able to predict gaze positions during simple observation, but few models attempt to simulate human behavior during visual search in natural scenes. Furthermore, these models vary widely in their design and exhibit differences in the datasets and metrics with which they were evaluated. Thus, there is a need for a reference point, on which each model can be tested and from where potential improvements can be derived. In the present work, we select publicly available state-of-the-art visual search models in natural scenes and evaluate them on different datasets, employing the same metrics to estimate their efficiency and similarity with human subjects. In particular, we propose an improvement to the Ideal Bayesian Searcher through a combination with a neural network-based visual search model, enabling it to generalize to other datasets. The present work sheds light on the limitations of current models and how potential improvements can be accomplished by combining approaches. Moreover, it moves forward on providing a solution for the urgent need for benchmarking data and metrics to support the development of more general human visual search computational models.

【67】 Computer-Assisted Creation of Boolean Search Rules for Text Classification in the Legal Domain 标题:法律领域文本分类布尔搜索规则的计算机辅助生成 链接:https://arxiv.org/abs/2112.05807

作者:Hannes Westermann,Jaromir Savelka,Vern R. Walker,Kevin D. Ashley,Karim Benyekhlef 机构:Cyberjustice Laboratory, Facult´e de droit, Universit´e de Montr´eal, ISP, School of Computing and Information, University of Pittsburgh, LLT Lab, Maurice A. Deane School of Law, Hofstra University 备注:None 摘要:在本文中,我们提出了一种以布尔搜索规则的形式构建强大的、可解释的分类器的方法。我们开发了一个称为CASE(计算机辅助语义探索)的交互式环境,该环境利用单词共现来指导注释者选择相关的搜索词。该系统无缝地促进了分类规则的迭代评估和改进。该过程使人类注释者能够利用统计信息的优势,同时将他们的专家直觉融入到此类规则的创建中。我们在4个数据集上评估了使用我们的案例系统创建的分类器,并将结果与机器学习方法进行比较,包括SKOPE规则、随机森林、支持向量机和fastText分类器。这些结果推动了关于布尔搜索规则优越的紧凑性、简单性和直观性与用于文本分类的最先进的机器学习模型的更好性能之间的权衡的讨论。 摘要:In this paper, we present a method of building strong, explainable classifiers in the form of Boolean search rules. We developed an interactive environment called CASE (Computer Assisted Semantic Exploration) which exploits word co-occurrence to guide human annotators in selection of relevant search terms. The system seamlessly facilitates iterative evaluation and improvement of the classification rules. The process enables the human annotators to leverage the benefits of statistical information while incorporating their expert intuition into the creation of such rules. We evaluate classifiers created with our CASE system on 4 datasets, and compare the results to machine learning methods, including SKOPE rules, Random forest, Support Vector Machine, and fastText classifiers. The results drive the discussion on trade-offs between superior compactness, simplicity, and intuitiveness of the Boolean search rules versus the better performance of state-of-the-art machine learning models for text classification.

【68】 Guided Generative Models using Weak Supervision for Detecting Object Spatial Arrangement in Overhead Images 标题:基于弱监督的制导生成模型在高空图像目标空间排列检测中的应用 链接:https://arxiv.org/abs/2112.05786

作者:Weiwei Duan,Yao-Yi Chiang,Stefan Leyk,Johannes H. Uhl,Craig A. Knoblock 机构:University of Southern California, University of Minnesota, University of Colorado Boulder 摘要:越来越多的高空图像的可用性和可访问性使我们能够估计和评估地理空间目标对象组的空间排列,这有助于许多应用,如交通监测和农业监测。空间排列估计是识别头顶图像中包含所需对象的区域的过程。传统的有监督目标检测方法可以估计精确的空间排列,但需要大量的边界框标注。最近的半监督聚类方法可以减少手动标记,但仍然需要对图像中的所有对象类别进行注释。在变分自动编码器(VAE)框架下,提出了目标引导生成模型(TGGM),该模型使用高斯混合模型(GMM)来估计VAE中隐藏变量和解码变量的分布。通过GMM对隐藏变量和解码器变量进行建模,大大减少了空间排列估计所需的手动注释。与现有方法不同的是,训练过程只能在优化迭代中整体更新GMM(例如,“小批量”),TGGM允许在同一优化迭代中单独更新单个GMM组件。单独优化GMM组件使TGGM能够利用空间数据中的语义关系,并且只需要几个标签就可以启动和指导生成过程。我们的实验表明,TGGM实现了与最先进的半监督方法相当的结果,并且基于$F_{1}$分数,TGGM的性能比无监督方法高出10%,同时需要的标记数据要少得多。 摘要:The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional supervised object detection approaches can estimate accurate spatial arrangement but require large amounts of bounding box annotations. Recent semi-supervised clustering approaches can reduce manual labeling but still require annotations for all object categories in the image. This paper presents the target-guided generative model (TGGM), under the Variational Auto-encoder (VAE) framework, which uses Gaussian Mixture Models (GMM) to estimate the distributions of both hidden and decoder variables in VAE. Modeling both hidden and decoder variables by GMM reduces the required manual annotations significantly for spatial arrangement estimation. Unlike existing approaches that the training process can only update the GMM as a whole in the optimization iterations (e.g., a "minibatch"), TGGM allows the update of individual GMM components separately in the same optimization iteration. Optimizing GMM components separately allows TGGM to exploit the semantic relationships in spatial data and requires only a few labels to initiate and guide the generative process. Our experiments shows that TGGM achieves results comparable to the state-of-the-art semi-supervised methods and outperforms unsupervised methods by 10% based on the $F_{1}$ scores, while requiring significantly fewer labeled data.

【69】 TempoQR: Temporal Question Reasoning over Knowledge Graphs 标题:TempoQR:基于知识图的时态问题推理 链接:https://arxiv.org/abs/2112.05785

作者:Costas Mavromatis,Prasanna Lakkur Subramanyam,Vassilis N. Ioannidis,Soji Adeshina,Phillip R. Howard,Tetiana Grinberg,Nagib Hakim,George Karypis 机构:University of Minnesota,University of Massachusetts Amherst,Amazon Web Services,Intel Labs 备注:AAAI 2022 摘要:知识图问答(KGQA)涉及使用自然语言查询从知识图(KG)中检索事实。KG是一组经过策划的事实,由关系连接的实体组成。某些事实还包括形成时间KG(TKG)的时间信息。尽管许多自然问题涉及明确或隐含的时间限制,但TKGs上的问题回答(QA)一直是一个相对未开发的领域。现有的解决方案主要是针对简单的时间问题设计的,这些问题可以由一个TKG事实直接回答。本文提出了一个基于嵌入的综合框架,用于回答TKGs上的复杂问题。我们称为时态问题推理(TempoQR)的方法利用TKG嵌入将问题定位到它所指的特定实体和时间范围。它通过使用三个专门的模块,使用上下文、实体和时间感知信息来扩充问题嵌入。第一种方法计算给定问题的文本表示,第二种方法将其与问题中涉及的实体的实体嵌入相结合,第三种方法生成特定于问题的时间嵌入。最后,基于转换器的编码器学习将生成的时间信息与用于答案预测的问题表示融合。大量实验表明,与最先进的方法相比,TempoQR在复杂时间问题上的准确率提高了25-45个百分点,并且更好地推广到不可见的问题类型。 摘要:Knowledge Graph Question Answering (KGQA) involves retrieving facts from a Knowledge Graph (KG) using natural language queries. A KG is a curated set of facts consisting of entities linked by relations. Certain facts include also temporal information forming a Temporal KG (TKG). Although many natural questions involve explicit or implicit time constraints, question answering (QA) over TKGs has been a relatively unexplored area. Existing solutions are mainly designed for simple temporal questions that can be answered directly by a single TKG fact. This paper puts forth a comprehensive embedding-based framework for answering complex questions over TKGs. Our method termed temporal question reasoning (TempoQR) exploits TKG embeddings to ground the question to the specific entities and time scope it refers to. It does so by augmenting the question embeddings with context, entity and time-aware information by employing three specialized modules. The first computes a textual representation of a given question, the second combines it with the entity embeddings for entities involved in the question, and the third generates question-specific time embeddings. Finally, a transformer-based encoder learns to fuse the generated temporal information with the question representation, which is used for answer predictions. Extensive experiments show that TempoQR improves accuracy by 25--45 percentage points on complex temporal questions over state-of-the-art approaches and it generalizes better to unseen question types.

【70】 A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing 标题:临床自然语言处理中可公开语言任务的范围研究综述 链接:https://arxiv.org/abs/2112.05780

作者:Yanjun Gao,Dmitriy Dligach,Leslie Christensen,Samuel Tesch,Ryan Laffin,Dongfang Xu,Timothy Miller,Ozlem Uzuner,Matthew M Churpek,Majid Afshar 机构: ICU Data Science Lab, School of Medicine and Public Health, Department of Computer Science, Loyola University Chicago, Chicago, IL, School of Medicine and Public Health, University of Wisconsin, Madison, WI 备注:Paper submitted to Journal of American Medical Informatics Association (JAMIA) 摘要:目的:对临床自然语言处理(NLP)任务的论文进行范围界定,这些论文使用了来自患者队列的公开电子健康记录数据。材料与方法:检索生物医学研究和计算机科学文献数据库等6个数据库。两名评审员进行了一轮标题/摘要筛选和全文筛选。我们的方法遵循系统评价和荟萃分析(PRISMA)指南的首选报告项目。结果:共有35篇文献,47篇临床NLP任务符合纳入标准,在2007和2021之间。我们根据NLP问题的类型对任务进行分类,包括名称实体识别、摘要和其他NLP任务。以临床决策支持应用为主题介绍了一些任务,如药物滥用、表型、临床试验队列选择。我们通过出版物和数据集信息总结了任务。讨论:随着语言系统的进步,NLP领域的发展,临床NLP任务的范围不断扩大。然而,在一般领域NLP社区和临床信息学社区之间的不同兴趣以及数据源的普遍性方面存在差距。我们还发现了数据选择和准备中的问题,包括缺乏时间敏感数据,以及问题规模和评估无效。结论:现有的临床NLP任务涵盖了广泛的主题,该领域将继续发展,并吸引普通领域NLP和临床信息学界的更多关注。我们鼓励未来的工作将多学科协作、报告透明度和数据准备标准化结合起来。 摘要:Objective: to provide a scoping review of papers on clinical natural language processing (NLP) tasks that use publicly available electronic health record data from a cohort of patients. Materials and Methods: We searched six databases, including biomedical research and computer science literature database. A round of title/abstract screening and full-text screening were conducted by two reviewers. Our method followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. Results: A total of 35 papers with 47 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including name entity recognition, summarization, and other NLP tasks. Some tasks were introduced with a topic of clinical decision support applications, such as substance abuse, phenotyping, cohort selection for clinical trial. We summarized the tasks by publication and dataset information. Discussion: The breadth of clinical NLP tasks keeps growing as the field of NLP evolves with advancements in language systems. However, gaps exist in divergent interests between general domain NLP community and clinical informatics community, and in generalizability of the data sources. We also identified issues in data selection and preparation including the lack of time-sensitive data, and invalidity of problem size and evaluation. Conclusions: The existing clinical NLP tasks cover a wide range of topics and the field will continue to grow and attract more attention from both general domain NLP and clinical informatics community. We encourage future work to incorporate multi-disciplinary collaboration, reporting transparency, and standardization in data preparation.

【71】 Accoustate: Auto-annotation of IMU-generated Activity Signatures under Smart Infrastructure 标题:AcCouState:智能基础设施下IMU生成的活动签名的自动标注 链接:https://arxiv.org/abs/2112.06651

作者:Soumyajit Chatterjee,Arun Singh,Bivas Mitra,Sandip Chakraborty 备注:10 pages, 7 figures 摘要:智能基础设施中的人类活动通过个人穿戴的可穿戴设备生成大量IMU数据。许多现有的研究依赖于这些感官数据来进行人类活动识别(HAR);然而,主要的瓶颈之一是它们依赖于预先注释或标记的数据。人工驱动的注释既不可伸缩也不高效,而现有的自动注释技术严重依赖于视频签名。尽管如此,基于视频的自动注释需要高计算资源,并且在将来自个人空间(如智能家居)的数据传输到云端时,存在隐私问题。本文利用人类活动产生的声学特征在边缘标记可穿戴设备的IMU数据,从而缓解资源需求和数据隐私问题。我们利用基于声学的预训练HAR模型对IMU数据进行跨模态标记,即使两个人在相同的环境背景下同时进行不同的活动。我们观察到,在环境的声学环境中,两个人同时进行的活动中,非重叠的声学间隙存在的概率很高,这有助于我们解决重叠的活动特征,以单独标记它们。在两个真实的内部数据集上对提议的方法进行原则性评估,进一步扩大以创建双乘员设置,结果表明,对于车间和厨房环境,该框架能够正确注释来自两个人的大量未标记IMU数据,准确度分别为$\mathbf{82.59\%}$($\mathbf{\pm 17.94\%}$)和$\mathbf{98.32\%}$($\mathbf{\pm 3.68\%}$)。 摘要:Human activities within smart infrastructures generate a vast amount of IMU data from the wearables worn by individuals. Many existing studies rely on such sensory data for human activity recognition (HAR); however, one of the major bottlenecks is their reliance on pre-annotated or labeled data. Manual human-driven annotations are neither scalable nor efficient, whereas existing auto-annotation techniques heavily depend on video signatures. Still, video-based auto-annotation needs high computation resources and has privacy concerns when the data from a personal space, like a smart-home, is transferred to the cloud. This paper exploits the acoustic signatures generated from human activities to label the wearables' IMU data at the edge, thus mitigating resource requirement and data privacy concerns. We utilize acoustic-based pre-trained HAR models for cross-modal labeling of the IMU data even when two individuals perform simultaneous but different activities under the same environmental context. We observe that non-overlapping acoustic gaps exist with a high probability during the simultaneous activities performed by two individuals in the environment's acoustic context, which helps us resolve the overlapping activity signatures to label them individually. A principled evaluation of the proposed approach on two real-life in-house datasets further augmented to create a dual occupant setup, shows that the framework can correctly annotate a significant volume of unlabeled IMU data from both individuals with an accuracy of $\mathbf{82.59\%}$ ($\mathbf{\pm 17.94\%}$) and $\mathbf{98.32\%}$ ($\mathbf{\pm 3.68\%}$), respectively, for a workshop and a kitchen environment.

【72】 Goedel's Incompleteness Theorem 标题:歌德尔的不完全性定理 链接:https://arxiv.org/abs/2112.06641

作者:Serafim Batzoglou 备注:20 pages 摘要:我以一种直观的方式给出了Goedel第一个不完全性定理的证明,同时涵盖了所有具有技术挑战性的步骤。我将戈德尔的不动点引理推广到两句话和多句话版本,通过撒谎者悖论的循环版本证明不完全性。我讨论了Goedel的第一和第二不完全性定理与Goedel的完全性定理之间的关系,并对这些结果对数学、计算、心理理论和人工智能的意义进行了总结。 摘要:I present the proof of Goedel's First Incompleteness theorem in an intuitive manner, while covering all technically challenging steps. I present generalizations of Goedel's fixed point lemma to two-sentence and multi-sentence versions, which allow proof of incompleteness through circular versions of the liar's paradox. I discuss the relation of Goedel's First and Second Incompletneness theorems to Goedel's Completeness theorems, and conclude with remarks on implications of these results for mathematics, computation, theory of mind and AI.

【73】 Efficient Training of Volterra Series-Based Pre-distortion Filter Using Neural Networks 标题:用神经网络高效训练基于Volterra级数的预失真过滤 链接:https://arxiv.org/abs/2112.06637

作者:Vinod Bajaj,Mathieu Chagnon,Sander Wahls,Vahid Aref 机构: Delft Center for Systems and Control, Delft University of Technology, CN Delft, Netherlands, Nokia Bell Labs, Lorenzstr. , Stuttgart, Germany 备注:Accepted for presentation in OFC 2022 摘要:我们提出了一种简单、高效的“直接学习”方法,用神经网络训练基于Volterra级数的数字预失真滤波器。我们使用64-QAM 64 GBaud模拟发射机,在不同的发射机非线性和噪声条件下,显示了其优于传统训练方法的性能。 摘要:We present a simple, efficient "direct learning" approach to train Volterra series-based digital pre-distortion filters using neural networks. We show its superior performance over conventional training methods using a 64-QAM 64-GBaud simulated transmitter with varying transmitter nonlinearity and noisy conditions.

【74】 Solving the non-preemptive two queue polling model with generally distributed service and switch-over durations and Poisson arrivals as a Semi-Markov Decision Process 标题:具有一般分布服务和切换持续时间的泊松到达非抢占式双队列轮询模型的半马尔可夫决策过程 链接:https://arxiv.org/abs/2112.06578

作者:Dylan Solms 机构:Department of Decision Sciences, University of South Africa 摘要:具有切换持续时间的轮询系统是一种有用的模型,具有多种实际应用。它被归类为一个离散事件动态系统(DEDS),没有一个一致的建模方法存在。此外,DED非常复杂。迄今为止,对感兴趣的投票系统建模的最复杂方法是连续时间马尔可夫决策过程(CTMDP)。本文提出了轮询系统的半马尔可夫决策过程(SMDP)公式,以引入额外的建模能力。这种能力是以截断误差和昂贵的数值积分为代价的,这自然会导致SMDP政策是否提供了一个有价值的优势的问题。为了进一步增加这种情况,本文展示了如何在CTMDP中利用稀疏性来开发计算效率高的模型。使用半马尔可夫过程模拟器评估SMDP和CTMDP策略的折扣性能。这两个策略都伴随着一个专门为此轮询系统开发的启发式策略以及一个穷举服务策略。参数和非参数假设检验用于检验绩效差异是否具有统计显著性。 摘要:The polling system with switch-over durations is a useful model with several practical applications. It is classified as a Discrete Event Dynamic System (DEDS) for which no one agreed upon modelling approach exists. Furthermore, DEDS are quite complex. To date, the most sophisticated approach to modelling the polling system of interest has been a Continuous-time Markov Decision Process (CTMDP). This paper presents a Semi-Markov Decision Process (SMDP) formulation of the polling system as to introduce additional modelling power. Such power comes at the expense of truncation errors and expensive numerical integrals which naturally leads to the question of whether the SMDP policy provides a worthwhile advantage. To further add to this scenario, it is shown how sparsity can be exploited in the CTMDP to develop a computationally efficient model. The discounted performance of the SMDP and CTMDP policies are evaluated using a Semi-Markov Process simulator. The two policies are accompanied by a heuristic policy specifically developed for this polling system a well as an exhaustive service policy. Parametric and non-parametric hypothesis tests are used to test whether differences in performance are statistically significant.

【75】 Gamifying optimization: a Wasserstein distance-based analysis of human search 标题:博弈优化:基于Wasserstein距离的人类搜索分析 链接:https://arxiv.org/abs/2112.06292

作者:Antonio Candelieri,Andrea Ponti,Francesco Archetti 机构: University of Milano-Bicocca, Department of Economics, Management and Statistics, Milan, Italy, University of Milano-Bicocca, Department of Computer Science, Systems and Communication, Milan, Italy, -,-,). 备注:49 pages, 39 figures. arXiv admin note: substantial text overlap with arXiv:2102.07647 摘要:本文的主要目的是概述一个理论框架,以描述人类在不确定性下的决策策略,特别是在黑箱优化任务中的主动学习以及信息收集(探索)和奖励寻求(利用)之间的权衡。人类根据这两个目标做出的决策可以用帕累托理性来建模。如果一个决策集包含一个帕累托有效策略,理性的决策者应该总是选择占支配地位的策略,而不是其占支配地位的备选方案。距离帕累托前沿的距离决定了一个选择是否是帕累托理性的。为了收集有关人类策略的数据,我们使用了一个游戏应用程序,该应用程序显示了游戏场、先前的决策和观察以及获得的分数。本文的关键是将人类学习者的行为模式表示为离散概率分布。这将人类行为的表征问题映射到一个空间中,该空间的元素是由直方图之间的距离构成的概率分布,即Wasserstein距离(WST)。分布分析为人类搜索策略及其与帕累托理性的偏差提供了新的见解。由于不确定性是定义帕累托前沿的两个目标之一,因此对三种不同的不确定性量化指标进行了分析,以确定哪一种更好地解释了符合帕累托前沿的行为模式。除了分析单个模式外,WST还支持计算重心和WST k-均值聚类的全局分析。通过决策树进行了进一步分析,将非帕累托行为(以愤怒的剥削为特征)与寻求奖励过程的演化动力学联系起来。 摘要:The main objective of this paper is to outline a theoretical framework to characterise humans' decision-making strategies under uncertainty, in particular active learning in a black-box optimization task and trading-off between information gathering (exploration) and reward seeking (exploitation). Humans' decisions making according to these two objectives can be modelled in terms of Pareto rationality. If a decision set contains a Pareto efficient strategy, a rational decision maker should always select the dominant strategy over its dominated alternatives. A distance from the Pareto frontier determines whether a choice is Pareto rational. To collect data about humans' strategies we have used a gaming application that shows the game field, with previous decisions and observations, as well as the score obtained. The key element in this paper is the representation of behavioural patterns of human learners as a discrete probability distribution. This maps the problem of the characterization of humans' behaviour into a space whose elements are probability distributions structured by a distance between histograms, namely the Wasserstein distance (WST). The distributional analysis gives new insights about human search strategies and their deviations from Pareto rationality. Since the uncertainty is one of the two objectives defining the Pareto frontier, the analysis has been performed for three different uncertainty quantification measures to identify which better explains the Pareto compliant behavioural patterns. Beside the analysis of individual patterns WST has also enabled a global analysis computing the barycenters and WST k-means clustering. A further analysis has been performed by a decision tree to relate non-Paretian behaviour, characterized by exasperated exploitation, to the dynamics of the evolution of the reward seeking process.

【76】 Improving Performance of Federated Learning based Medical Image Analysis in Non-IID Settings using Image Augmentation 标题:利用图像增强提高非IID环境下基于联邦学习的医学图像分析性能 链接:https://arxiv.org/abs/2112.06194

作者:Alper Emin Cetinkaya,Dr. Murat Akin,Prof. Dr. Seref Sagiroglu 机构:Information Security Program, Ankara, Turkey, -,-,-, Murat Akin, Gazi AI Center of Gazi University, Basarsoft Information Systems Inc., Seref Sagiroglu, Computer Engineering Dept., Gazi AI Center, Gazi University 摘要:联邦学习(FL)是一个合适的解决方案,用于利用属于患者、个人、公司或行业的敏感数据,这些数据必须在严格的隐私约束下工作。FL主要或部分支持数据隐私和安全问题,并提供模型问题的替代方案,方便多个边缘设备或组织使用大量本地数据对全球模型进行训练,而不需要这些数据。FL的非IID数据因其分布特性而呈现出显著的性能退化和稳定偏差。针对FL的非IID数据问题,提出了一种通过增强图像动态平衡客户数据分布的新方法。该方法显著地稳定了模型训练,并将模型的测试精度从83.22%提高到89.43%非IID FL设置。IID、非IID和非IID与提议的方法联合训练的结果表明,提议的方法可能有助于鼓励组织或研究人员开发更好的系统,从数据中获取数据隐私价值,这不仅适用于医疗保健,也适用于其他领域。 摘要:Federated Learning (FL) is a suitable solution for making use of sensitive data belonging to patients, people, companies, or industries that are obligatory to work under rigid privacy constraints. FL mainly or partially supports data privacy and security issues and provides an alternative to model problems facilitating multiple edge devices or organizations to contribute a training of a global model using a number of local data without having them. Non-IID data of FL caused from its distributed nature presents a significant performance degradation and stabilization skews. This paper introduces a novel method dynamically balancing the data distributions of clients by augmenting images to address the non-IID data problem of FL. The introduced method remarkably stabilizes the model training and improves the model's test accuracy from 83.22% to 89.43% for multi-chest diseases detection of chest X-ray images in highly non-IID FL setting. The results of IID, non-IID and non-IID with proposed method federated trainings demonstrated that the proposed method might help to encourage organizations or researchers in developing better systems to get values from data with respect to data privacy not only for healthcare but also other fields.

【77】 Quantum Architecture Search via Continual Reinforcement Learning 标题:基于连续强化学习的量子体系结构搜索 链接:https://arxiv.org/abs/2112.05779

作者:Esther Ye,Samuel Yen-Chi Chen 机构:Department of Electrical and Computer Engineering, Boston University, Boston, MA , USA, Computational Science Initiative, Brookhaven National Laboratory, Upton, NY , USA, ) 摘要:与经典计算机相比,量子计算有望在解决复杂计算任务方面取得重大进步。然而,为实际用途设计量子电路并不是一个简单的目标,需要专家级的知识。为了帮助这一努力,本文提出了一种基于机器学习的方法来构建量子电路体系结构。以前的工作已经证明,经典的深度强化学习(DRL)算法可以在没有编码物理知识的情况下成功构建量子电路结构。然而,这些基于DRL的工作不能推广到设备噪声不断变化的环境中,因此需要大量的训练资源来保持RL模型的最新。考虑到这一点,我们结合了持续学习来提高算法的性能。在本文中,我们提出了基于深度Q学习的概率策略重用(PPR-DQL)框架来解决这一电路设计难题。通过对各种噪声模式进行数值模拟,我们证明了具有PPR的RL代理能够比从头开始训练的代理更快地找到量子门序列来生成双量子比特贝尔态。该框架具有通用性,可应用于其他量子门合成或控制问题,包括量子器件的自动校准。 摘要:Quantum computing has promised significant improvement in solving difficult computational tasks over classical computers. Designing quantum circuits for practical use, however, is not a trivial objective and requires expert-level knowledge. To aid this endeavor, this paper proposes a machine learning-based method to construct quantum circuit architectures. Previous works have demonstrated that classical deep reinforcement learning (DRL) algorithms can successfully construct quantum circuit architectures without encoded physics knowledge. However, these DRL-based works are not generalizable to settings with changing device noises, thus requiring considerable amounts of training resources to keep the RL models up-to-date. With this in mind, we incorporated continual learning to enhance the performance of our algorithm. In this paper, we present the Probabilistic Policy Reuse with deep Q-learning (PPR-DQL) framework to tackle this circuit design challenge. By conducting numerical simulations over various noise patterns, we demonstrate that the RL agent with PPR was able to find the quantum gate sequence to generate the two-qubit Bell state faster than the agent that was trained from scratch. The proposed framework is general and can be applied to other quantum gate synthesis or control problems -- including the automatic calibration of quantum devices.

【78】 Edge-Enhanced Dual Discriminator Generative Adversarial Network for Fast MRI with Parallel Imaging Using Multi-view Information 标题:边缘增强型双区分器生成对抗网络多视角并行成像快速磁共振成像 链接:https://arxiv.org/abs/2112.05758

作者:Jiahao Huang,Weiping Ding,Jun Lv,Jingwen Yang,Hao Dong,Javier Del Ser,Jun Xia,Tiaojuan Ren,Stephen Wong,Guang Yang 机构:Received: date Accepted: date, College of Information Science and Technology, Zhejiang Shuren University, Hangzhou, National Heart and Lung Institute, Imperial College London, London, United Kingdom 备注:33 pages, 13 figures, Applied Intelligence 摘要:在临床医学中,磁共振成像(MRI)是诊断、分类、预后和治疗计划的最重要工具之一。然而,由于数据是在k空间中顺序采集的,因此MRI存在固有的缓慢数据采集过程。近年来,文献中提出的大多数MRI重建方法侧重于整体图像重建,而不是增强边缘信息。这项工作通过详细阐述边缘信息的增强来避开这一总体趋势。具体地说,我们介绍了一种新的并行成像耦合双鉴别器生成对抗网络(PIDD-GAN),用于通过合并多视图信息进行快速多通道MRI重建。双鉴别器设计旨在改善MRI重建中的边缘信息。一个鉴别器用于整体图像重建,而另一个用于增强边缘信息。提出了一种改进的局部和全局残差学习U网络。频率通道注意块(FCA块)嵌入在发生器中,用于合并注意机制。引入内容丢失来训练生成器以获得更好的重建质量。我们在卡尔加里·坎皮纳斯公共脑MR数据集上进行了综合实验,并将我们的方法与最先进的MRI重建方法进行了比较。在MICCAI13数据集上进行剩余学习的消融研究,以验证所提出的模块。结果表明,我们的PIDD-GAN提供了高质量的重建MR图像,并保留了良好的边缘信息。单幅图像重建时间小于5ms,满足快速处理的要求。 摘要:In clinical medicine, magnetic resonance imaging (MRI) is one of the most important tools for diagnosis, triage, prognosis, and treatment planning. However, MRI suffers from an inherent slow data acquisition process because data is collected sequentially in k-space. In recent years, most MRI reconstruction methods proposed in the literature focus on holistic image reconstruction rather than enhancing the edge information. This work steps aside this general trend by elaborating on the enhancement of edge information. Specifically, we introduce a novel parallel imaging coupled dual discriminator generative adversarial network (PIDD-GAN) for fast multi-channel MRI reconstruction by incorporating multi-view information. The dual discriminator design aims to improve the edge information in MRI reconstruction. One discriminator is used for holistic image reconstruction, whereas the other one is responsible for enhancing edge information. An improved U-Net with local and global residual learning is proposed for the generator. Frequency channel attention blocks (FCA Blocks) are embedded in the generator for incorporating attention mechanisms. Content loss is introduced to train the generator for better reconstruction quality. We performed comprehensive experiments on Calgary-Campinas public brain MR dataset and compared our method with state-of-the-art MRI reconstruction methods. Ablation studies of residual learning were conducted on the MICCAI13 dataset to validate the proposed modules. Results show that our PIDD-GAN provides high-quality reconstructed MR images, with well-preserved edge information. The time of single-image reconstruction is below 5ms, which meets the demand of faster processing.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-12-14,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档