前往小程序,Get更优阅读体验!
立即前往
发布
社区首页 >专栏 >金融/语音/音频处理学术速递[8.19]

金融/语音/音频处理学术速递[8.19]

作者头像
公众号-arXiv每日学术速递
发布2021-08-24 16:34:37
发布2021-08-24 16:34:37
3600
举报

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

q-fin金融,共计7篇

cs.SD语音,共计1篇

eess.AS音频处理,共计3篇

1.q-fin金融:

【1】 Tilted Platforms: Rental Housing Technology and the Rise of Urban Big Data Oligopolies 标题:倾斜平台:租房技术与城市大数据寡头崛起 链接:https://arxiv.org/abs/2108.08229

作者:Geoff Boeing,Max Besbris,David Wachsmuth,Jake Wegmann 机构:University of Southern California, Los Angeles, USA, Full list of author information is, available at the end of the article 备注:None 摘要:本文解释了关于租赁住房平台的新兴学术,特别是最著名和最常用的短期和长期租赁住房平台,并考虑了将短期和长期租赁与平台经济联系起来的技术过程是如何改变城市的。它讨论了更公平地分配利益和减轻危害的潜在政策方法。我们认为信息技术不是价值中立的。虽然租房平台可能赋予数据分析师和某些市场参与者权力,但并非所有用户或整个社会都是如此。首先,用户生成的在线数据经常重现传统住房信息来源中的系统偏差。越来越多的证据表明,租赁住房平台的信息广播潜力可能会增加,而不是缓解社会空间不平等。其次,技术平台根据创造者自身的财务和政治利益来策划和塑造信息。在这些平台上,哪些数据和人员被隐藏或边缘化的问题与哪些数据可用的问题同样重要。最后,短期和长期租赁住房平台之间存在重要的利弊差异,但在文献中未得到充分探讨:本文对这些差异进行分析,并提出政策建议。 摘要:This article interprets emerging scholarship on rental housing platforms -- particularly the most well-known and used short- and long-term rental housing platforms - and considers how the technological processes connecting both short-term and long-term rentals to the platform economy are transforming cities. It discusses potential policy approaches to more equitably distribute benefits and mitigate harms. We argue that information technology is not value-neutral. While rental housing platforms may empower data analysts and certain market participants, the same cannot be said for all users or society at large. First, user-generated online data frequently reproduce the systematic biases found in traditional sources of housing information. Evidence is growing that the information broadcasting potential of rental housing platforms may increase rather than mitigate sociospatial inequality. Second, technology platforms curate and shape information according to their creators' own financial and political interests. The question of which data -- and people -- are hidden or marginalized on these platforms is just as important as the question of which data are available. Finally, important differences in benefits and drawbacks exist between short-term and long-term rental housing platforms, but are underexplored in the literature: this article unpacks these differences and proposes policy recommendations.

【2】 Why North Korean Refugees are Reluctant to Compete: The Roles of Cognitive Ability 标题:朝鲜难民为何不愿竞争:认知能力的作用 链接:https://arxiv.org/abs/2108.08097

作者:Syngjoo Choi,Byung-Yeon Kim,Jungmin Lee,Sokbae Lee 机构:Columbia University and Institute for Fiscal Studies, ∗We would like to thank Philip Jang for his excellent research assistance. The research reported in, (IRB No. ,,-,). This work was supported in part by the National Research Foundation of Ko- 摘要:本文通过比较在韩国出生和成长于三个具有不同制度环境的国家(韩国、朝鲜和中国)的三个韩国群体,研究竞争力的发展。基于实验室实验的结果表明,朝鲜难民的竞争力明显低于韩国人或朝鲜裔中国移民。此外,通过概率加权选择模型的分析表明,较低的认知能力可能与较低的预期绩效水平、更悲观的主体信念和更大的竞争厌恶感有关。 摘要:This paper investigates the development of competitiveness by comparing three Korean groups in South Korea, born and raised in three countries with distinct institutional environments: South Korea, North Korea, and China. Results based on laboratory experiments show that North Korean refugees are significantly less competitive than South Koreans or Korean-Chinese immigrants. Furthermore, analyses through the lens of a choice model with probability weighting suggest that lower cognitive ability may be associated with lower levels of expected performance, more pessimistic subject beliefs and greater aversion to competition.

【3】 The Generalized Gamma distribution as a useful RND under Heston's stochastic volatility model 标题:Heston随机波动模型下的广义Gamma分布作为一种有用的RND 链接:https://arxiv.org/abs/2108.07937

作者:Ben Boukai 机构:Department of Mathematical Sciences, IUPUI, Indianapolis, IN , USA 备注:21 pages with 5 figures and 3 tables 摘要:继Boukai(2021)之后,我们提出了广义伽马分布,作为Heston(1993)随机波动率模型下欧洲期权价格建模的可能RND。这种分布在现货价格服从负偏态分布的情况下尤其有用,因此,基于Black-Scholes(即对数正态分布)的建模在很大程度上是不合适的。我们应用广义伽马分布对三种大型市场ETF(即SPY、IWM和QQQ)的当前市场期权数据进行建模。这三只ETF的当前期权链显示出其波动率“微笑”的明显倾斜,这表明此类期权数据的Black-Scholes模型可能存在扭曲。基于广义伽马分布,我们对每个ETF(2021年10月15日,到期日为63天)的可用期权数据进行了全面建模,并将其与期权定价和RND模型进行了比较,该模型直接从经过良好校准的Heston(1993)SV模型(理论和经验)中获得,使用蒙特卡罗模拟现货价格)。所有三种ETF均呈现负偏态分布,这与广义伽马分布得出的分布非常匹配。Black-Scholes模型在具有负偏态分布的情况下的不足进一步通过其对对冲因子delta的影响以及对零售交易者的直接影响来说明。 摘要:Following Boukai (2021) we present the Generalized Gamma Distribution as a possible RND for modeling European options prices under Heston's (1993) stochastic volatility model. This distribution is seen as especially useful in situations in which the spot's price follows a negatively skewed distribution and hence, Black-Scholes based (i.e. the log-normal distribution) modeling is largely inapt. We apply the Generalized Gamma distribution to modeling current market option data on three large market ETFs, namely the SPY, IWM and QQQ. The current option chain for these three ETFs a shows of a pronounced skew of their volatility `smile' which indicates a likely distortion in the Black-Scholes modeling of such option data. We provide a a thorough modeling of the available option data we have on each ETF (the October 15, 2021 with 63 to expiration) based on the Generalized Gamma Distribution and compared it to the option pricing and RND modeling obtained directly from a well-calibrated Heston's (1993) SV model (both theoretically and empirically, using Monte-Carlo simulations of the spot's price). All three ETFs exhibit negatively skewed distributions which are well-matched with those derived from the Generalized Gamma Distribution. The inadequacy of the Black-Scholes modeling in such instances with negatively skewed distribution is further illustrated by its impact on the hedging factor, delta, and the immediate implications to the retail trader.

【4】 Wealth disparities and economic flow: Assessment using an asset exchange model with the surplus stock of the wealthy 标题:贫富差距与经济流动:使用富豪剩余存量的资产交换模型进行评估 链接:https://arxiv.org/abs/2108.07888

作者:Takeshi Kato,Yoshinori Hiroi 机构: Hitachi Kyoto University Laboratory, Open Innovation Institute, Kyoto University, Kokoro Research Center, Kyoto University, Kyoto, Japan, Corresponding author 备注:31 pages, 7 figures 摘要:在可持续社会中,我们如何在刺激经济流动的同时限制财富差距?为了检验这些概念之间的联系,我们提出了一个富人剩余存量的经济物理学资产交换模型。富人是两大交易所之一,他们的资产比穷人多。我们的模拟模型将富人的剩余贡献率与储蓄率一起转换为一个新的变量参数,并引入总汇率(流量)和秩相关系数(代谢)作为新的评估指标,添加到基尼指数(差异)中,从而评估财富分配以及差异、流动和代谢之间的关系。我们的模型揭示了限制差距和活跃市场之间的权衡。为了限制差距,增加流量和新陈代谢,我们还发现有必要限制储蓄,并利用富裕的剩余存量。这种关系在本文介绍的新方程中明确表示。通过揭示差距的根源而获得的见解可能为投资社会保障措施或涉及股票再分配或共享的社会企业提供了一个有说服力的案例。 摘要:How can we limit wealth disparities while stimulating economic flows in sustainable societies? To examine the link between these concepts, we propose an econophysics asset exchange model with the surplus stock of the wealthy. The wealthy are one of the two exchange agents and have more assets than the poor. Our simulation model converts the surplus contribution rate of the wealthy to a new variable parameter alongside the saving rate and introduces the total exchange (flow) and rank correlation coefficient (metabolism) as new evaluation indexes, adding to the Gini index (disparities), thereby assessing both wealth distribution and the relationships among the disparities, flow, and metabolism. We show that these result in a gamma-like wealth distribution, and our model reveals a trade-off between limiting disparities and vitalizing the market. To limit disparities and increase flow and metabolism, we also find the need to restrain savings and use the wealthy surplus stock. This relationship is explicitly expressed in the new equation introduced herein. The insights gained by uncovering the root of disparities may present a persuasive case for investments in social security measures or social businesses involving stock redistribution or sharing.

【5】 Economists' erroneous estimates of damages from climate change 标题:经济学家对气候变化损害的错误估计 链接:https://arxiv.org/abs/2108.07847

作者:Stephen Keen,Timothy M. Lenton,Antoine Godin,Devrim Yilmaz,Matheus Grasselli,Timothy J. Garrett 机构:Agence Francaise de Developpement, Economists, predicted, global warming will be as low as ,.,% of global, production, average surface temperature, and ,.,% for a ,◦C, rise. Such relatively trivial estimates of economic, damages—when these economists otherwise assume 摘要:经济学家预测,如果全球平均地表温度上升3美元左右,全球变暖造成的损失将低至全球经济生产的2.1%,如果上升6美元左右,则损失将低至7.9%。这些经济学家假设人类的经济生产率将比今天高出一个数量级,而对经济损失的这种相对微不足道的估计与科学家关于气候变化导致人类宜居性显著降低的预测形成了强烈对比。尽管如此,用于做出此类预测的经济和气候耦合模型在国际气候变化辩论和政策处方中具有影响力。在这里,我们回顾了经济学家所做的实证工作,并表明它严重低估了气候变化造成的损害,因为它犯了一些方法上的错误,包括忽视了临界点,并假设没有受到天气影响的经济部门不受气候变化的影响。最基本的是,有影响力的综合评估模型DICE被证明无法产生经济崩溃,无论损失程度如何。鉴于这些缺陷,经济学家对全球变暖造成的经济损失的经验估计应该被视为不科学而予以拒绝,并且已经根据这些估计进行了校准的模型(如DICE)不应该用于评估气候变化带来的经济风险,也不应该用于制定减少损失的政策。 摘要:Economists have predicted that damages from global warming will be as low as 2.1% of global economic production for a 3$^\circ$C rise in global average surface temperature, and 7.9% for a 6$^\circ$C rise. Such relatively trivial estimates of economic damages -- when these economists otherwise assume that human economic productivity will be an order of magnitude higher than today -- contrast strongly with predictions made by scientists of significantly reduced human habitability from climate change. Nonetheless, the coupled economic and climate models used to make such predictions have been influential in the international climate change debate and policy prescriptions. Here we review the empirical work done by economists and show that it severely underestimates damages from climate change by committing several methodological errors, including neglecting tipping points, and assuming that economic sectors not exposed to the weather are insulated from climate change. Most fundamentally, the influential Integrated Assessment Model DICE is shown to be incapable of generating an economic collapse, regardless of the level of damages. Given these flaws, economists' empirical estimates of economic damages from global warming should be rejected as unscientific, and models that have been calibrated to them, such as DICE, should not be used to evaluate economic risks from climate change, or in the development of policy to attenuate damages.

【6】 Simulation and estimation of an agent-based market-model with a matching engine 标题:具有匹配引擎的基于Agent的市场模型的仿真与估计 链接:https://arxiv.org/abs/2108.07806

作者:Ivan Jericevich,Patrick Chang,Tim Gebbie 机构:Department of Statistical Sciences, University of Cape Town, Rondebosch , South Africa, Department of Engineering Science, University of Oxford, Oxford OX,PJ, United Kingdom 备注:29 Pages, 30 figures 摘要:一个基于代理的模型可以提供真实的模拟价格影响曲线,该模型由交互的低频流动性接受者和高频流动性提供者共同作为做市商进行中介。当基于代理的模型交互通过订单匹配异步发生时,这是可能的,使用事件时间中的匹配引擎来替代顺序日历时间市场清算。在这里,匹配引擎基础设施已被修改,以提供订单确认和更新的连续反馈,作为消息流,以便更紧密地符合实时交易环境。然后,对模拟产生的交易和报价信息数据进行聚合、校准和可视化。各种风格化的事实与事件可视化和价格影响曲线一起呈现。我们认为,当交互是反应性的、异步的和事件时间内的时,通过一组小的代理参数和简单的交互规则,可以在建模中实现额外的真实性。我们认为,市场主体的反应性可能是金融市场的一个基本属性,当考虑到这一点时,可以在不依赖其他噪声源的情况下进行简约建模。 摘要:An agent-based model with interacting low frequency liquidity takers inter-mediated by high-frequency liquidity providers acting collectively as market makers can be used to provide realistic simulated price impact curves. This is possible when agent-based model interactions occur asynchronously via order matching using a matching engine in event time to replace sequential calendar time market clearing. Here the matching engine infrastructure has been modified to provide a continuous feed of order confirmations and updates as message streams in order to conform more closely to live trading environments. The resulting trade and quote message data from the simulations are then aggregated, calibrated and visualised. Various stylised facts are presented along with event visualisations and price impact curves. We argue that additional realism in modelling can be achieved with a small set of agent parameters and simple interaction rules once interactions are reactive, asynchronous and in event time. We argue that the reactive nature of market agents may be a fundamental property of financial markets and when accounted for can allow for parsimonious modelling without recourse to additional sources of noise.

【7】 Stochastic loss reserving with mixture density neural networks 标题:基于混合密度神经网络的随机损失准备金 链接:https://arxiv.org/abs/2108.07924

作者:Muhammed Taher Al-Mudafer,Benjamin Avanzi,Greg Taylor,Bernard Wong 机构:School of Risk and Actuarial Studies, UNSW Australia Business School, UNSW Sydney NSW , Australia, Centre for Actuarial Studies, Department of Economics, University of Melbourne VIC , Australia 摘要:神经网络提供了一种多功能、灵活和精确的损失储备方法。然而,这类应用主要集中在(重要的)拟合未决索赔准确中央估计数的问题上。在实践中,有关未决索赔可变性的属性同样重要(例如,出于监管目的的分位数)。在本文中,我们通过将混合密度网络(“MDN”)应用于损失准备金来填补这一空白。该方法将神经网络结构与混合高斯分布相结合,同时实现精确的中心估计和灵活的分布选择。模型拟合使用滚动原点方法完成。当应用于各种复杂度和规格的广泛模拟环境时,我们的方法在中心估计和感兴趣的分位数方面始终优于经典的过分散模型。我们通过提出两个扩展来进一步扩展MDN方法。首先,我们提出了一种称为“ResMDN”的GLM-MDN混合方法。这种混合方法一方面平衡了传统GLM模型的易处理性和易理解性,另一方面也平衡了MDN提供的额外准确性和分布灵活性。我们表明,它可以成功地改善基线ccODP的错误,尽管在我们考虑的示例中,与MDN相比,性能通常有所下降。其次,我们允许显式投影约束,以便精算判断可以直接纳入建模过程中。自始至终,我们关注的是总损失三角形,并表明我们的方法是可处理的,即使数据量相对有限,它们也比传统方法更有效。我们使用模拟数据(验证属性)和真实数据(说明和确定方法的实用性)。 摘要:Neural networks offer a versatile, flexible and accurate approach to loss reserving. However, such applications have focused primarily on the (important) problem of fitting accurate central estimates of the outstanding claims. In practice, properties regarding the variability of outstanding claims are equally important (e.g., quantiles for regulatory purposes). In this paper we fill this gap by applying a Mixture Density Network ("MDN") to loss reserving. The approach combines a neural network architecture with a mixture Gaussian distribution to achieve simultaneously an accurate central estimate along with flexible distributional choice. Model fitting is done using a rolling-origin approach. Our approach consistently outperforms the classical over-dispersed model both for central estimates and quantiles of interest, when applied to a wide range of simulated environments of various complexity and specifications. We further extend the MDN approach by proposing two extensions. Firstly, we present a hybrid GLM-MDN approach called "ResMDN". This hybrid approach balances the tractability and ease of understanding of a traditional GLM model on one hand, with the additional accuracy and distributional flexibility provided by the MDN on the other. We show that it can successfully improve the errors of the baseline ccODP, although there is generally a loss of performance when compared to the MDN in the examples we considered. Secondly, we allow for explicit projection constraints, so that actuarial judgement can be directly incorporated in the modelling process. Throughout, we focus on aggregate loss triangles, and show that our methodologies are tractable, and that they out-perform traditional approaches even with relatively limited amounts of data. We use both simulated data -- to validate properties, and real data -- to illustrate and ascertain practicality of the approaches.

2.cs.SD语音:

【1】 Joint Multiple Intent Detection and Slot Filling via Self-distillation 标题:基于自蒸馏的联合多意图检测与缝隙填充 链接:https://arxiv.org/abs/2108.08042

作者:Lisong Chen,Peilin Zhou,Yuexian Zou 机构:ADSPLAB, School of ECE, Peking University, Shenzhen, China, Peng Cheng Laboratory, Shenzhen, China 摘要:意图检测和时隙填充是自然语言理解(NLU)中的两个主要任务,用于从用户的话语中识别用户的需求。这两项任务高度相关,通常是联合训练的。然而,大多数以前的工作都假设每个话语只对应一个意图,忽略了用户话语在许多情况下可能包含多个意图这一事实。在本文中,我们提出了一种新的用于多目标NLU的自蒸馏联合NLU模型(SDJN)。首先,我们将多意图检测描述为一个弱监督问题,并采用多实例学习(MIL)方法。然后,我们通过自蒸馏设计了一个辅助环路,其中包含三个有序排列的解码器:初始时隙解码器、MIL意向解码器和最终时隙解码器。每个解码器的输出将作为下一个解码器的辅助信息。利用MIL意向解码器提供的辅助知识,我们将最终时隙解码器设置为教师模型,将知识传递回初始时隙解码器以完成循环。辅助环路使意图和时隙能够相互深入地引导,并进一步提高NLU的整体性能。在两个公共多意图数据集上的实验结果表明,与其他模型相比,我们的模型具有很强的性能。 摘要:Intent detection and slot filling are two main tasks in natural language understanding (NLU) for identifying users' needs from their utterances. These two tasks are highly related and often trained jointly. However, most previous works assume that each utterance only corresponds to one intent, ignoring the fact that a user utterance in many cases could include multiple intents. In this paper, we propose a novel Self-Distillation Joint NLU model (SDJN) for multi-intent NLU. First, we formulate multiple intent detection as a weakly supervised problem and approach with multiple instance learning (MIL). Then, we design an auxiliary loop via self-distillation with three orderly arranged decoders: Initial Slot Decoder, MIL Intent Decoder, and Final Slot Decoder. The output of each decoder will serve as auxiliary information for the next decoder. With the auxiliary knowledge provided by the MIL Intent Decoder, we set Final Slot Decoder as the teacher model that imparts knowledge back to Initial Slot Decoder to complete the loop. The auxiliary loop enables intents and slots to guide mutually in-depth and further boost the overall NLU performance. Experimental results on two public multi-intent datasets indicate that our model achieves strong performance compared to others.

3.eess.AS音频处理:

【1】 Two Streams and Two Resolution Spectrograms Model for End-to-end Automatic Speech Recognition 标题:端到端自动语音识别的双流双分辨率谱图模型 链接:https://arxiv.org/abs/2108.07980

作者:Jin Li,Xurong Xie,Nan Yan,Lan Wang 机构:CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 摘要:Transformer在自动语音识别(ASR)方面取得了巨大的进步,优于基于递归神经网络的方法。Transformer体系结构擅长并行化数据以加速并捕获基于内容的全局交互。然而,大多数关于Transformer的研究只利用了从主干提取的浅层特征,而没有利用具有不变性的深层特征。在本文中,我们提出了一种新的框架,其中包含两个流,每个流由不同分辨率的光谱图组成,旨在捕获浅层和深层特征。特征提取模块由用于小分辨率谱图的深网络和用于大分辨率谱图的浅网络组成。主干不仅可以获得语音文本对齐的详细声学信息,还可以获得说话人信息等句子不变特征。这两个特征与我们提出的融合方法融合,然后输入到Transformer编码器-解码器。通过我们的方法,我们提出的框架在普通话语料库上表现出了很好的竞争力。它的CER为21.08,优于科大曼达里亚电话ASR基准的各种最新结果。据我们所知,这是首次将深层特征纳入主干的调查。 摘要:Transformer has shown tremendous progress in Automatic Speech Recognition (ASR), outperforming recurrent neural network-based approaches. Transformer architecture is good at parallelizing data to accelerate as well as capturing content-based global interaction. However, most studies with Transfomer have been utilized only shallow features extracted from the backbone without taking advantage of the deep feature that possesses invariant property. In this paper, we propose a novel framework with two streams that consist of different resolution spectrograms for each steam aiming to capture both shallow and deep features. The feature extraction module consists of a deep network for small resolution spectrogram and a shallow network for large resolution spectrogram. The backbone obtains not only detailed acoustic information for speech-text alignment but also sentence invariant features such as speaker information. Both features are fused with our proposed fusion method and then input into the Transformer encoder-decoder. With our method, the proposed framework shows competitive performance on Mandarin corpus. It outperforms various current state-of-the-art results on the HKUST Mandarian telephone ASR benchmark with a CER of 21.08. To the best of our knowledge, this is the first investigation of incorporating deep features to the backbone.

【2】 FDN: Finite Difference Network with Hierachical Convolutional Features for Text-independent Speaker verification 标题:FDN:文本无关说话人确认的分层卷积有限差分网络 链接:https://arxiv.org/abs/2108.07974

作者:Jin Li,Nan Yan,Lan Wang 机构:CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 摘要:近年来,直接利用原始波形作为输入被广泛应用于说话人识别系统。例如,RawNet[1]和RawNet2[2]从原始波形中提取特征嵌入,这在很大程度上减少了前端计算并实现了最先进的性能。然而,他们并不考虑人与人之间的语音速度的影响。在本文中,我们提出了一种新的有限差分网络来获得说话人嵌入。它通过计算相邻时间语音片段之间的有限差分来合并说话人语音速度。此外,我们还设计了一个分层结构来捕获多尺度语音速度特征,以提高系统的准确性。然后将说话人嵌入输入GRU,以在softmax丢失之前聚合话语级特征。对官方VoxCeleb1测试数据的实验结果以及对VoxCeleb1-E和VoxCeleb-H协议的扩展评估表明,我们的方法优于现有的最先进的系统。为方便进一步研究,可在https://github.com/happyjin/FDN 摘要:Recently, directly utilize raw waveforms as input is widely explored for the speaker verification system. For example, RawNet [1] and RawNet2 [2] extract feature embeddings from raw waveforms, which largely reduce the front-end computation and achieve state-of-the-art performance. However, they do not consider the speech speed influence which is different from person to person. In this paper, we propose a novel finite-difference network to obtain speaker embeddings. It incorporates speaker speech speed by computing the finite difference between adjacent time speech pieces. Furthermore, we design a hierarchical layer to capture multiscale speech speed features to improve the system accuracy. The speaker embeddings is then input into the GRU to aggregate utterance-level features before the softmax loss. Experiment results on official VoxCeleb1 test data and expanded evaluation on VoxCeleb1-E and VoxCeleb-H protocols show our method outperforms existing state-of-the-art systems. To facilitate further research, code is available at https://github.com/happyjin/FDN

【3】 Joint Multiple Intent Detection and Slot Filling via Self-distillation 标题:基于自蒸馏的联合多意图检测与缝隙填充 链接:https://arxiv.org/abs/2108.08042

作者:Lisong Chen,Peilin Zhou,Yuexian Zou 机构:ADSPLAB, School of ECE, Peking University, Shenzhen, China, Peng Cheng Laboratory, Shenzhen, China 摘要:意图检测和时隙填充是自然语言理解(NLU)中的两个主要任务,用于从用户的话语中识别用户的需求。这两项任务高度相关,通常是联合训练的。然而,大多数以前的工作都假设每个话语只对应一个意图,忽略了用户话语在许多情况下可能包含多个意图这一事实。在本文中,我们提出了一种新的用于多目标NLU的自蒸馏联合NLU模型(SDJN)。首先,我们将多意图检测描述为一个弱监督问题,并采用多实例学习(MIL)方法。然后,我们通过自蒸馏设计了一个辅助环路,其中包含三个有序排列的解码器:初始时隙解码器、MIL意向解码器和最终时隙解码器。每个解码器的输出将作为下一个解码器的辅助信息。利用MIL意向解码器提供的辅助知识,我们将最终时隙解码器设置为教师模型,将知识传递回初始时隙解码器以完成循环。辅助环路使意图和时隙能够相互深入地引导,并进一步提高NLU的整体性能。在两个公共多意图数据集上的实验结果表明,与其他模型相比,我们的模型具有很强的性能。 摘要:Intent detection and slot filling are two main tasks in natural language understanding (NLU) for identifying users' needs from their utterances. These two tasks are highly related and often trained jointly. However, most previous works assume that each utterance only corresponds to one intent, ignoring the fact that a user utterance in many cases could include multiple intents. In this paper, we propose a novel Self-Distillation Joint NLU model (SDJN) for multi-intent NLU. First, we formulate multiple intent detection as a weakly supervised problem and approach with multiple instance learning (MIL). Then, we design an auxiliary loop via self-distillation with three orderly arranged decoders: Initial Slot Decoder, MIL Intent Decoder, and Final Slot Decoder. The output of each decoder will serve as auxiliary information for the next decoder. With the auxiliary knowledge provided by the MIL Intent Decoder, we set Final Slot Decoder as the teacher model that imparts knowledge back to Initial Slot Decoder to complete the loop. The auxiliary loop enables intents and slots to guide mutually in-depth and further boost the overall NLU performance. Experimental results on two public multi-intent datasets indicate that our model achieves strong performance compared to others.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-08-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档