访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
stat统计学,共计22篇
【1】 Identification of parameters in the torsional dynamics of a drilling process through Bayesian statistics 标题:基于贝叶斯统计的钻井过程扭转动力学参数辨识
作者:Mario Germán Sandoval,Americo Cunha Jr,Rubens Sampaio 机构:Department of Mechanical Engineering, PUC–Rio, Rua Marquˆes de S˜ao Vicente, G´avea, Rio de Janeiro - RJ, Brazil., Departamento de F´ısica, Universidad Nacional del Sur, Avenida Alem , - Bah´ıa Blanca (CP ,) - Pcia. de Buenos Aires - Argentina. 备注:None 链接:https://arxiv.org/abs/2107.13535 摘要:这项工作提出了一个实验装置的参数估计,该装置被建模为一个三自由度系统,由一根轴、两个转子和一个直流电机组成,模拟钻井过程。在估计过程中使用了贝叶斯技术,以考虑测量固有的不确定性和变异性,这些不确定性和变异性被建模为高斯噪声。通过该程序,预计将检查试验台物理参数标称值的可靠性。假设实验装置的九个参数未知,进行了估算过程,结果表明,对于某些量,相对于标称值的相对偏差非常大。这种偏差表明用于描述实验装置动态行为的数学模型存在严重缺陷。 摘要:This work presents the estimation of the parameters of an experimental setup, which is modeled as a system with three degrees of freedom, composed by a shaft, two rotors, and a DC motor, that emulates a drilling process. A Bayesian technique is used in the estimation process, to take into account the uncertainties and variabilities intrinsic to the measurement taken, which are modeled as a noise of Gaussian nature. With this procedure it is expected to check the reliability of the nominal values of the physical parameters of the test rig. An estimation process assuming that nine parameters of the experimental apparatus are unknown is conducted, and the results show that for some quantities the relative deviation with respect to the nominal values is very high. This deviation evidentiates a strong deficiency in the mathematical model used to describe the dynamic behavior of the experimental apparatus.
【2】 Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications 标题:光滑1-Wasserstein距离的极限分布理论及其应用
作者:Ritwik Sadhu,Ziv Goldfeld,Kengo Kato 链接:https://arxiv.org/abs/2107.13494 摘要:光滑1-Wasserstein距离(SWD)$W_1^\sigma$最近被提出作为一种在保持Wasserstein结构的同时减轻经验近似中维数灾难的方法。事实上,SWD具有参数收敛速度,并继承了经典Wasserstein距离的度量和拓扑结构。基于上述动机,本研究对SWD进行了深入的统计研究,包括经验$W\u 1^\sigma$的高维极限分布结果、bootstrap一致性、浓度不等式和Berry-Esseen型界。导出的非简并极限与经典的经验$W_1$形成鲜明对比,后者的类似结果仅在一维情况下已知。我们还探讨了当平滑参数$\sigma$以$n$缩放,以足够慢的速度收敛到$0$时的渐近性和极限分布的特征。采样分布的维数仅通过预因子(即常数)进入经验SWD收敛边界。我们提供了该预因子对平滑参数和内禀维数依赖性的清晰表征。然后利用这一结果推导出经典$W_1$在内禀维方面的新的经验收敛率。作为极限分布理论的应用,我们研究了$W_1^\sigma$下的两样本检验和最小距离估计(MDE)。我们建立了SWD检验的渐近有效性,而对于MDE,我们证明了最优估计量及其相应的$W_1^\sigma$误差的可测性、几乎肯定收敛性和极限分布。我们的结果表明,社会福利署非常适合高维统计学习和推理。 摘要:The smooth 1-Wasserstein distance (SWD) $W_1^\sigma$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation while preserving the Wasserstein structure. Indeed, SWD exhibits parametric convergence rates and inherits the metric and topological structure of the classic Wasserstein distance. Motivated by the above, this work conducts a thorough statistical study of the SWD, including a high-dimensional limit distribution result for empirical $W_1^\sigma$, bootstrap consistency, concentration inequalities, and Berry-Esseen type bounds. The derived nondegenerate limit stands in sharp contrast with the classic empirical $W_1$, for which a similar result is known only in the one-dimensional case. We also explore asymptotics and characterize the limit distribution when the smoothing parameter $\sigma$ is scaled with $n$, converging to $0$ at a sufficiently slow rate. The dimensionality of the sampled distribution enters empirical SWD convergence bounds only through the prefactor (i.e., the constant). We provide a sharp characterization of this prefactor's dependence on the smoothing parameter and the intrinsic dimension. This result is then used to derive new empirical convergence rates for classic $W_1$ in terms of the intrinsic dimension. As applications of the limit distribution theory, we study two-sample testing and minimum distance estimation (MDE) under $W_1^\sigma$. We establish asymptotic validity of SWD testing, while for MDE, we prove measurability, almost sure convergence, and limit distributions for optimal estimators and their corresponding $W_1^\sigma$ error. Our results suggest that the SWD is well suited for high-dimensional statistical learning and inference.
【3】 Survival stacking: casting survival analysis as a classification problem 标题:生存堆积:将生存分析作为一个分类问题
作者:Erin Craig,Chenyang Zhong,Robert Tibshirani 机构:Department of Biomedical Data Science, Stanford University, Department of Statistics, Stanford University, Departments of Biomedical Data Science and Statistics, Stanford University 链接:https://arxiv.org/abs/2107.13480 摘要:虽然有许多成熟的数据科学方法用于分类和回归,但处理右删失数据的方法相对较少。在这里,我们提出了“生存叠加”:一种将生存分析问题转化为分类问题的方法,从而允许在生存环境中使用一般分类方法和软件。受Cox偏似然理论的启发,生存叠加在一个具有二进制结果的大数据帧中收集生存数据的特征和结果。我们证明了logistic回归的生存叠加近似等于Cox比例风险模型。我们进一步推荐了在生存叠加环境中评估模型性能的方法,并在真实和模拟数据上演示了生存叠加。通过将生存问题重新定义为分类问题,我们使数据科学家能够在生存环境中使用著名的学习算法(包括随机森林、梯度提升机和神经网络),并降低灵活生存建模的障碍。 摘要:While there are many well-developed data science methods for classification and regression, there are relatively few methods for working with right-censored data. Here, we present "survival stacking": a method for casting survival analysis problems as classification problems, thereby allowing the use of general classification methods and software in a survival setting. Inspired by the Cox partial likelihood, survival stacking collects features and outcomes of survival data in a large data frame with a binary outcome. We show that survival stacking with logistic regression is approximately equivalent to the Cox proportional hazards model. We further recommend methods for evaluating model performance in the survival stacked setting, and we illustrate survival stacking on real and simulated data. By reframing survival problems as classification problems, we make it possible for data scientists to use well-known learning algorithms (including random forests, gradient boosting machines and neural networks) in a survival setting, and lower the barrier for flexible survival modeling.
【4】 MSTL: A Seasonal-Trend Decomposition Algorithm for Time Series with Multiple Seasonal Patterns 标题:MSTL:一种多季节模式时间序列的季节趋势分解算法
作者:Kasun Bandara,Rob J Hyndman,Christoph Bergmeir 机构:School of Computing and Information Systems, Melbourne Centre for Data Science, University of, Department of Econometrics and Business Statistics, Monash University, Department of Data Science and AI, Monash University 链接:https://arxiv.org/abs/2107.13462 摘要:将时间序列分解为组件是一项重要的任务,它有助于理解时间序列并实现更好的预测。如今,由于高采样率导致了高频数据(如每日、每小时或每分钟数据),许多真实世界的数据集包含可以显示多种季节模式的时间序列数据。虽然已经提出了几种方法来更好地分解这些情况下的时间序列,但它们往往计算效率低下或不准确。在本研究中,我们提出了使用黄土的多季节趋势分解(MSTL),这是传统的使用黄土的季节趋势分解(STL)程序的扩展,允许分解具有多个季节模式的时间序列。与其他具有竞争力的实时和扰动数据集相比,我们的MSTL分解的计算成本更低。MSTL的实现在R包forecast中提供。 摘要:The decomposition of time series into components is an important task that helps to understand time series and can enable better forecasting. Nowadays, with high sampling rates leading to high-frequency data (such as daily, hourly, or minutely data), many real-world datasets contain time series data that can exhibit multiple seasonal patterns. Although several methods have been proposed to decompose time series better under these circumstances, they are often computationally inefficient or inaccurate. In this study, we propose Multiple Seasonal-Trend decomposition using Loess (MSTL), an extension to the traditional Seasonal-Trend decomposition using Loess (STL) procedure, allowing the decomposition of time series with multiple seasonal patterns. In our evaluation on synthetic and a perturbed real-world time series dataset, compared to other decomposition benchmarks, MSTL demonstrates competitive results with lower computational cost. The implementation of MSTL is available in the R package forecast.
【5】 Kernel Density Estimation by Stagewise Algorithm with a Simple Dictionary 标题:基于简单字典的分段核密度估计
作者:Kiheiji Nishida,Kanta Naito 链接:https://arxiv.org/abs/2107.13430 摘要:本文利用一个简单的U-散度字典,研究了基于分段极小化算法的核密度估计。我们将一个i.i.d.样本随机分成两个不相交的集合,一个用于构造字典中的核,另一个用于评估估计器,并实现该算法。由此产生的估计器为我们带来了数据自适应加权参数和带宽矩阵,并实现了核密度估计的稀疏表示。我们给出了估计量的非渐近误差界,并通过与直接插入式带宽矩阵和约化集密度估计量的比较,验证了其性能。 摘要:This paper studies kernel density estimation by stagewise minimization algorithm with a simple dictionary on U-divergence. We randomly split an i.i.d. sample into the two disjoint sets, one to be used for constructing the kernels in the dictionary and the other for evaluating the estimator, and implement the algorithm. The resulting estimator brings us data-adaptive weighting parameters and bandwidth matrices, and realizes a sparse representation of kernel density estimation. We present the non-asymptotic error bounds of our estimator and confirm its performance by simulations compared with the direct plug-in bandwidth matrices and the reduced set density estimator.
【6】 Sparse approximation of triangular transports. Part II: the infinite dimensional case 标题:三角形传输的稀疏近似。第二部分:无限维情形
作者:Jakob Zech,Youssef Marzouk 备注:The original manuscript arXiv:2006.06994v1 has been split into two parts; the present paper is the second part 链接:https://arxiv.org/abs/2107.13422 摘要:对于$[-1,1]^{\mathbb{N}}$上的两个概率测度$\rho$和$\pi$,我们研究了三角结的近似,即Rosenblatt传输$T:[-1,1]^{\mathbb{N}}到[-1,1]^{\mathbb{N}}$,它将$\rho$推到$\pi$。在适当的假设下,我们证明了$T$可以用有理函数近似而不受维数灾难的影响。我们的结果适用于某些推理问题中出现的后验测度,其中未知量属于(无限维)Banach空间。特别地,我们证明了通过变换低维的潜在变量,可以有效地从某些高维测度中近似采样。 摘要:For two probability measures $\rho$ and $\pi$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $\rho$ to $\pi$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior measures arising in certain inference problems where the unknown belongs to an (infinite dimensional) Banach space. In particular, we show that it is possible to efficiently approximately sample from certain high-dimensional measures by transforming a lower-dimensional latent variable.
【7】 Nonlinear State Space Modeling and Control of the Impact of Patients' Modifiable Lifestyle Behaviors on the Emergence of Multiple Chronic Conditions 标题:患者可改变生活方式行为对多种慢性病发生影响的非线性状态空间建模与控制
作者:Syed Hasib Akhter Faruqui,Adel Alaeddini,Jing Wang,Susan P Fisher-Hoch,Joseph B Mccormic 机构:School of Nursing, UT Health San Antonio, San Antonio, TX ,., School of Public Health Brownsville, The University of Texas Health Science Center at Houston, TX ,. 备注:Submitted to IEEE Access for review 链接:https://arxiv.org/abs/2107.13394 摘要:随着时间的推移,多种慢性病(MCC)的出现和进展通常形成一个动态网络,该网络取决于患者可改变的风险因素及其与不可改变的风险因素和现有条件的相互作用。连续时间贝叶斯网络(CTBNs)是对复杂的MCC关系网络进行建模的有效方法。然而,CTBN不能有效地描述患者可改变的危险因素对MCC发生和进展的动态影响。考虑到功能性CTBN(FCTBN)代表MCC与个人风险因素和现有条件之间关系的基本结构,我们提出了一个基于扩展卡尔曼滤波(EKF)的非线性状态空间模型,以捕捉患者可修改的风险因素和MCC随时间演化的现有条件的动态。我们还开发了张量控制图,以动态监测个体患者可改变的风险因素变化对新慢性病出现风险的影响。我们基于来自卡梅隆县拉美裔队列(CCHC)的385名患者的数据集的模拟和真实数据验证了所提出的方法。根据可改变的饮食习惯(4)和非肥胖风险因素(4)和可改变的饮食习惯(4)和饮食习惯(4)和可改变的饮食习惯(4)和非肥胖风险因素,教育)。结果表明,所提出的方法对于动态预测和监测个别患者出现MCC的风险是有效的。 摘要:The emergence and progression of multiple chronic conditions (MCC) over time often form a dynamic network that depends on patient's modifiable risk factors and their interaction with non-modifiable risk factors and existing conditions. Continuous time Bayesian networks (CTBNs) are effective methods for modeling the complex network of MCC relationships over time. However, CTBNs are not able to effectively formulate the dynamic impact of patient's modifiable risk factors on the emergence and progression of MCC. Considering a functional CTBN (FCTBN) to represent the underlying structure of the MCC relationships with respect to individuals' risk factors and existing conditions, we propose a nonlinear state-space model based on Extended Kalman filter (EKF) to capture the dynamics of the patients' modifiable risk factors and existing conditions on the MCC evolution over time. We also develop a tensor control chart to dynamically monitor the effect of changes in the modifiable risk factors of individual patients on the risk of new chronic conditions emergence. We validate the proposed approach based on a combination of simulation and real data from a dataset of 385 patients from Cameron County Hispanic Cohort (CCHC) over multiple years. The dataset examines the emergence of 5 chronic conditions (Diabetes, Obesity, Cognitive Impairment, Hyperlipidemia, and Hypertension) based on 4 modifiable risk factors representing lifestyle behaviors (Diet, Exercise, Smoking Habit, and Drinking Habit) and 3 non-modifiable risk factors, including demographic information (Age, Gender, Education). The results demonstrate the effectiveness of the proposed methodology for dynamic prediction and monitoring of the risk of MCC emergence in individual patients.
【8】 One-step ahead sequential Super Learning from short times series of many slightly dependent data, and anticipating the cost of natural disasters 标题:超前一步从许多轻微相依的数据的短时间序列中进行序贯超级学习,并预测自然灾害的成本
作者:Geoffrey Ecoto,Aurélien Bibaut,Antoine Chambaz 机构: Caisse Centrale de R´eassurance, MAP, (UMR CNRS ,), Universit´e de Paris, Netflix 链接:https://arxiv.org/abs/2107.13291 摘要:假设我们观察到一个短时间序列,其中每个特定于时间t的数据结构由许多由a索引的略微相关的数据组成,并且我们希望估计既不依赖于t也不依赖于a的实验定律的特征。我们开发并研究了一种算法,以顺序学习在用户提供的集合中,根据超额风险和预言不等式,哪个基本算法最能执行估计任务。Janson[2004]的分析使用依赖图对每个t-特定数据结构内的条件独立性数量和集中度不等式进行建模,在面对少量t-特定数据结构时,利用了不同a的数量与依赖图的程度的较大比率。所谓的一步领先超级学习者应用于激励示例,其中的挑战是预测法国自然灾害的成本。 摘要:Suppose that we observe a short time series where each time-t-specific data-structure consists of many slightly dependent data indexed by a and that we want to estimate a feature of the law of the experiment that depends neither on t nor on a. We develop and study an algorithm to learn sequentially which base algorithm in a user-supplied collection best carries out the estimation task in terms of excess risk and oracular inequalities. The analysis, which uses dependency graph to model the amount of conditional independence within each t-specific data-structure and a concentration inequality by Janson [2004], leverages a large ratio of the number of distinct a's to the degree of the dependency graph in the face of a small number of t-specific data-structures. The so-called one-step ahead Super Learner is applied to the motivating example where the challenge is to anticipate the cost of natural disasters in France.
【9】 Global minimizers, strict and non-strict saddle points, and implicit regularization for deep linear neural networks 标题:深线性神经网络的全局极小化、严格鞍点和非严格鞍点及隐式正则化
作者:El Mehdi Achour,François Malgouyres,Sébastien Gerchinovitz 机构:Institut de Math´ematiques de Toulouse ; UMR, Universit´e de Toulouse ; CNRS , UPS IMT, F-, Toulouse Cedex , France, IRT Saint Exup´ery, rue Tarfaya, Toulouse, France 链接:https://arxiv.org/abs/2107.13289 摘要:在非凸环境中,基于梯度的算法在目标函数的局部结构(如严格和非严格鞍点、局部和全局极小值和极大值)附近的行为是不同的。因此,对非凸问题的描述至关重要。也就是说,尽可能地描述上述每个类别的点集。在这项工作中,我们研究了与深度线性神经网络和平方损失相关的经验风险。众所周知,在弱假设下,该目标函数没有虚假的局部极小值和局部极大值。我们更进一步,在所有临界点中,刻画了全局极小值、严格鞍点和非严格鞍点。我们列举了所有相关的临界值。该特征描述简单,涉及部分矩阵乘积秩的条件,并揭示了在优化线性神经网络时已证明或观察到的全局收敛性或隐式正则化。顺便说一下,我们还提供了所有全局极小值集的显式参数化,并展示了大量严格和非严格鞍点集。 摘要:In non-convex settings, it is established that the behavior of gradient-based algorithms is different in the vicinity of local structures of the objective function such as strict and non-strict saddle points, local and global minima and maxima. It is therefore crucial to describe the landscape of non-convex problems. That is, to describe as well as possible the set of points of each of the above categories. In this work, we study the landscape of the empirical risk associated with deep linear neural networks and the square loss. It is known that, under weak assumptions, this objective function has no spurious local minima and no local maxima. We go a step further and characterize, among all critical points, which are global minimizers, strict saddle points, and non-strict saddle points. We enumerate all the associated critical values. The characterization is simple, involves conditions on the ranks of partial matrix products, and sheds some light on global convergence or implicit regularization that have been proved or observed when optimizing a linear neural network. In passing, we also provide an explicit parameterization of the set of all global minimizers and exhibit large sets of strict and non-strict saddle points.
【10】 Non-stationarity in correlation matrices for wind turbine SCADA-data and implications for failure detection 标题:风力机SCADA数据相关矩阵的非平稳性及其对故障检测的启示
作者:Henrik M. Bette,Edgar Jungblut,Thomas Guhr 机构:Fakultät für Physik, University of Duisburg-Essen, Duisburg, Germany 备注:16 pages, 11 figures 链接:https://arxiv.org/abs/2107.13256 摘要:现代公用事业规模的风力涡轮机配备了监控和数据采集(SCADA)系统,该系统收集了大量的运行数据,可用于故障分析和预测,以改善涡轮机的运行和维护。我们分析了海上风电场的高频SCADA数据,并评估了具有移动时间窗的各种观测值的皮尔逊相关矩阵。这使得对不同类型数据的相互依赖性进行非平稳性评估成为可能。根据我们在其他复杂系统(如金融市场和交通)中的经验,我们通过在相关矩阵上采用分层的$k$-均值聚类算法来证明这一点。不同的团簇表现出不同的典型关联结构,我们称之为态。首先看一个涡轮机,然后看多个涡轮机,这些状态的主要依赖性显示为风速。因此,我们根据可用风速将其识别为涡轮机控制系统中不同设置引起的运行状态。我们基于聚类解对分离状态的边界风速进行建模。这使得我们可以通过基于风速对新数据进行排序并将其与各自的运行状态进行比较,从而考虑到非平稳性,从而使用我们的方法进行故障分析或预测。 摘要:Modern utility-scale wind turbines are equipped with a Supervisory Control And Data Acquisition (SCADA) system gathering vast amounts of operational data that can be used for failure analysis and prediction to improve operation and maintenance of turbines. We analyse high freqeuency SCADA-data from an offshore windpark and evaluate Pearson correlation matrices for a variety of observables with a moving time window. This renders possible an asessment of non-stationarity in mutual dependcies of different types of data. Drawing from our experience in other complex systems, such as financial markets and traffic, we show this by employing a hierarchichal $k$-means clustering algorithm on the correlation matrices. The different clusters exhibit distinct typical correlation structures to which we refer as states. Looking first at only one and later at multiple turbines, the main dependence of these states is shown to be on wind speed. In accordance, we identify them as operational states arising from different settings in the turbine control system based on the available wind speed. We model the boundary wind speeds seperating the states based on the clustering solution. This allows the usage of our methodology for failure analysis or prediction by sorting new data based on wind speed and comparing it to the respective operational state, thereby taking the non-stationarity into account.
【11】 Modeling the systematic behavior at the micro and nano length scale 标题:在微米和纳米尺度上对系统行为进行建模
作者:Danilo Quagliotti 备注:41 pages, 18 figures 链接:https://arxiv.org/abs/2107.13075 摘要:工业数字化创新的迅猛发展,导致制造技术的高度自动化和大数据传输,要求不断开发适当的离线计量方法,以支持过程质量,并对测量不确定度进行可容忍的评估。一方面,特定领域的参考文献提出了尚未针对变化背景进行优化的方法,另一方面,国际通用建议指导了有效的不确定度评估,但提出了在微观和纳米尺度上不一定有效的程序。分析了著名的GUM方法(即频率统计法),目的是一致地测试其对微/纳米尺寸和表面形貌测量的适用性。调查评估了三种不同的澄清情况,产生了一致的模型方程,并实现了可追溯性。案例的选择提供了一些影响因素,这些因素是微观和纳米尺度上的典型责任,与系统行为的纠正有关,即。重复测量的数量、获取的显微照片的时间顺序和使用的仪器。这种方法使GUM方法得以成功地应用于微/纳米尺寸和地形测量,并对该方法的有效性水平、应用限制和未来可能发展的提示进行了评估。 摘要:The brisk progression of the industrial digital innovation, leading to high degree of automation and big data transfer in manufacturing technologies, demands continuous development of appropriate off-line metrology methods to support processes' quality with a tolerable assessment of the measurement uncertainty. On the one hand specific-area references propose methods that are not yet well optimized to the changed background, and on the other, international general recommendations guide to effective uncertainty evaluation, but suggesting procedures that are not necessarily proven efficient at the micro- and nano-dimensional scale. The well-known GUM approach (i.e. frequentist statistics) was analyzed with the aim to test consistently its applicability to micro/nano dimensional and surface topography measurements. The investigation assessed three different clarifying situations, giving rise to consistent model equations, and to the achievement of the traceability. The choice of the cases provided a number of influence factors, which are typical liabilities at the micro and nano-length scale, and that have been related to the correction of the systematic behavior, viz. the amount of repeated measurements, the time sequence of the acquired micrographs and the instruments used. Such approach allowed the successful implementation of the GUM approach to micro/nano dimensional and topographic measurements, and also the appraisal of the level of efficacy of the method, its application limits and hints on possible future developments.
【12】 An Aggregation Scheme for Increased Power in Primary Outcome Analysis 标题:一种在初步结果分析中提高权值的聚合方案
作者:Timothy Lycurgus,Ben B. Hansen 机构: as well as research support from the National Science Foundation (DMS 16 46 108) and the Institutefor Education Sciences (R 30 5A 1 208 1 1 备注:33 pages 链接:https://arxiv.org/abs/2107.13070 摘要:当干预具有稳健且清晰的变化理论时,一种新的聚合方案可提高随机对照试验和准实验的功效。纵向数据分析干预措施通常包括对个体的多个观察,其中一些可能比其他更有可能表现出治疗效果。干预的变化理论提供了指导,说明哪些观察结果最适合显示治疗效果。我们的延迟效应重复测量功率最大化加权方案PWRD聚合将变化理论转化为具有改进Pitman效率的测试统计,以更大的统计功率提供测试。我们在IES资助的一项随机整群试验中说明了这种方法,该试验测试了旨在帮助有落后于同龄人风险的早期小学生的阅读干预的有效性。显著的变化理论认为,在学生的表现停滞后,课程收益会被延迟和不一致。这种干预没有效果,但发现PWRD技术对功率的影响与加倍(集群级)样本量的效果相当。 摘要:A novel aggregation scheme increases power in randomized controlled trials and quasi-experiments when the intervention possesses a robust and well-articulated theory of change. Longitudinal data analyzing interventions often include multiple observations on individuals, some of which may be more likely to manifest a treatment effect than others. An intervention's theory of change provides guidance as to which of those observations are best situated to exhibit that treatment effect. Our power-maximizing weighting for repeated-measurements with delayed-effects scheme, PWRD aggregation, converts the theory of change into a test statistic with improved Pitman efficiency, delivering tests with greater statistical power. We illustrate this method on an IES-funded cluster randomized trial testing the efficacy of a reading intervention designed to assist early elementary students at risk of falling behind their peers. The salient theory of change holds program benefits to be delayed and non-uniform, experienced after a student's performance stalls. This intervention is not found to have an effect, but the PWRD technique's effect on power is found to be comparable to that of a doubling of (cluster-level) sample size.
【13】 Recursive Estimation of a Failure Probability for a Lipschitz Function 标题:Lipschitz函数失效概率的递归估计
作者:Lucie Bernard,Albert Cohen,Arnaud Guyader,Florent Malrieu 机构:IDP, Universit´e de Tours, France, LJLL, Sorbonne Universit´e, France, LPSM, Sorbonne Universit´e & CERMICS, France 链接:https://arxiv.org/abs/2107.13369 摘要:设g:$\Omega$=[0,1]d$\rightarrow$R表示一个Lipschitz函数,该函数可以在每个点进行计算,但代价是计算时间很长。设X代表一个随机变量,其值以$\Omega$为单位,这样就可以根据X定律对$\Omega$的任何子集的限制,至少近似地进行模拟。例如,由于马尔可夫链蒙特卡罗技术,当X允许一个已知的标准化常数的密度时,这总是可能的。在这种情况下,给定一个确定性阈值T,使得失效概率p:=p(g(X)>T)可能非常低,我们的目标是用对g的最小调用次数来估计后者。在这个目标中,在Cohen等人[9]的基础上,我们提出了一种递归优化算法,用于选择感兴趣的动态区域并估计它们各自的概率。 摘要:Let g : $\Omega$ = [0, 1] d $\rightarrow$ R denote a Lipschitz function that can be evaluated at each point, but at the price of a heavy computational time. Let X stand for a random variable with values in $\Omega$ such that one is able to simulate, at least approximately, according to the restriction of the law of X to any subset of $\Omega$. For example, thanks to Markov chain Monte Carlo techniques, this is always possible when X admits a density that is known up to a normalizing constant. In this context, given a deterministic threshold T such that the failure probability p := P(g(X) > T) may be very low, our goal is to estimate the latter with a minimal number of calls to g. In this aim, building on Cohen et al. [9], we propose a recursive and optimal algorithm that selects on the fly areas of interest and estimate their respective probabilities.
【14】 Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators 标题:你对凯特的估计很在行吗?论疗效评价基准比较中被忽视的假设
作者:Alicia Curth,Mihaela van der Schaar 机构: isstatistical in nature and mainly linked to estimation perfor- 1University of Cambridge 2UCLA 3The Alan Turing Institute 备注:Workshop on the Neglected Assumptions in Causal Inference at the International Conference on Machine Learning (ICML), 2021 链接:https://arxiv.org/abs/2107.13346 摘要:从观测数据估计异质治疗效果的机器学习工具箱正在迅速扩展,但其许多算法仅在非常有限的半合成基准数据集上进行了评估。在本文中,我们表明,即使在可以说是最简单的情况下——在可忽略假设下的估计——如果(i)基准数据集中数据生成机制的基础假设和(ii)它们与基线算法的相互作用没有得到充分的讨论,这种经验评估的结果也会产生误导。我们考虑两种流行的机器学习基准数据集评估异质性治疗效果估计器- IHDP和ACIC2016数据集-详细。我们发现了它们当前使用中存在的问题,并强调了基准数据集的固有特征有利于某些算法而非其他算法——这一事实很少得到承认,但对于解释实证结果具有极大的相关性。最后,我们将讨论影响和可能的后续步骤。 摘要:The machine learning toolbox for estimation of heterogeneous treatment effects from observational data is expanding rapidly, yet many of its algorithms have been evaluated only on a very limited set of semi-synthetic benchmark datasets. In this paper, we show that even in arguably the simplest setting -- estimation under ignorability assumptions -- the results of such empirical evaluations can be misleading if (i) the assumptions underlying the data-generating mechanisms in benchmark datasets and (ii) their interplay with baseline algorithms are inadequately discussed. We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators -- the IHDP and ACIC2016 datasets -- in detail. We identify problems with their current use and highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others -- a fact that is rarely acknowledged but of immense relevance for interpretation of empirical results. We close by discussing implications and possible next steps.
【15】 Bayesian Autoencoders: Analysing and Fixing the Bernoulli likelihood for Out-of-Distribution Detection 标题:贝叶斯自动编码器:失配检测的伯努利似然分析与修正
作者:Bang Xiang Yong,Tim Pearce,Alexandra Brintrup 机构:Figure 1 shows a VAE trained on FashionMNIST and tested 1Institute for Manufacturing, University of Cambridge 备注:Presented at the ICML 2020 Workshop on Uncertainty and Ro-bustness in Deep Learning 链接:https://arxiv.org/abs/2107.13304 摘要:在自动编码器(AE)学会重建一个数据集之后,可以预期分布外(OOD)输入的可能性很低。这是一种检测OOD输入的方法。最近的研究表明,这种直观的方法对于数据集对FashionList和MNList可能会失败。本文认为这是由于使用了伯努利似然法,并分析了为什么会出现这种情况,提出了两个解决方案:1)使用贝叶斯版本的AE计算似然估计的不确定性。2) 使用替代分布对可能性进行建模。 摘要:After an autoencoder (AE) has learnt to reconstruct one dataset, it might be expected that the likelihood on an out-of-distribution (OOD) input would be low. This has been studied as an approach to detect OOD inputs. Recent work showed this intuitive approach can fail for the dataset pairs FashionMNIST vs MNIST. This paper suggests this is due to the use of Bernoulli likelihood and analyses why this is the case, proposing two fixes: 1) Compute the uncertainty of likelihood estimate by using a Bayesian version of the AE. 2) Use alternative distributions to model the likelihood.
【16】 Refined Cramér Type Moderate Deviation Theorems for General Self-normalized Sums with Applications to Dependent Random Variables and Winsorized Mean 标题:一般自归一化和的精化Cramér型中偏差定理及其对相依随机变量和Winsorized均值的应用
作者:Lan Gao,Qi-Man Shao,Jiasheng Shi 机构:i=, be a sequence of independent bivariate random vec-, tors. In this paper, we establish a refined Cramér type moderate deviation, theorem for the general self-normalized sum �n, i=, Xi(�n, i=, Y , i ), 链接:https://arxiv.org/abs/2107.13205 摘要:设{(X_i,Y_i)}{i=1}^n是一个独立的二元随机向量序列。本文建立了一般自规范化和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/和/或和/或/,邵和王(2003)以及王(2011)进一步完善的版本。通过对弱相依随机变量和自规范化winsorized均值的成功应用,证明了我们结果的优越性。具体地说,通过应用我们关于一般自规范化和的新框架,我们显著改进了单相依随机变量、几何β混合随机变量和几何矩收缩下因果过程的Cram?er型中偏差定理。作为一个附加应用,我们还推导了自规范化winsorized平均值的Cram?er型中偏差定理。 摘要:Let {(X_i,Y_i)}_{i=1}^n be a sequence of independent bivariate random vectors. In this paper, we establish a refined Cram\'er type moderate deviation theorem for the general self-normalized sum \sum_{i=1}^n X_i/(\sum_{i=1}^n Y_i^2)^{1/2}, which unifies and extends the classical Cram\'er (1938) theorem and the self-normalized Cram\'er type moderate deviation theorems by Jing, Shao and Wang (2003) as well as the further refined version by Wang (2011). The advantage of our result is evidenced through successful applications to weakly dependent random variables and self-normalized winsorized mean. Specifically, by applying our new framework on general self-normalized sum, we significantly improve Cram\'er type moderate deviation theorems for one-dependent random variables, geometrically \beta-mixing random variables and causal processes under geometrical moment contraction. As an additional application, we also derive the Cram\'er type moderate deviation theorems for self-normalized winsorized mean.
【17】 Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers 标题:具有统计意义的逼近:以Transformer逼近图灵机为例
作者:Colin Wei,Yining Chen,Tengyu Ma 机构:Stanford University 链接:https://arxiv.org/abs/2107.13163 摘要:从理论上研究神经网络结构的一个常见视角是分析它们可以近似的函数。然而,近似理论的构造往往有不切实际的方面,例如,依赖于无限精度来记忆目标函数值,这使得这些结果可能没有什么意义。为了解决这些问题,这项工作提出了一个有统计意义的近似的正式定义,它要求近似网络表现出良好的统计可学习性。我们提出了两类函数的统计意义近似的案例研究:布尔电路和图灵机。我们证明了过参数化前馈神经网络可以在统计上有意义地逼近布尔电路,其样本复杂度仅依赖于电路大小的多项式,而不是逼近网络的大小。此外,我们还表明transformers可以在统计上有意义地逼近图灵机,计算时间以$T$为界,需要字母表大小、状态空间大小和$\log(T)$的样本复杂度多项式。我们的分析引入了新的泛化边界工具,它比典型的VC维或基于范数的边界提供了更严格的样本复杂度保证,这可能是独立的兴趣。 摘要:A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, the constructions from approximation theory often have unrealistic aspects, for example, reliance on infinite precision to memorize target function values, which make these results potentially less meaningful. To address these issues, this work proposes a formal definition of statistically meaningful approximation which requires the approximating network to exhibit good statistical learnability. We present case studies on statistically meaningful approximation for two classes of functions: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can statistically meaningfully approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the approximating network. In addition, we show that transformers can statistically meaningfully approximate Turing machines with computation time bounded by $T$, requiring sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. Our analysis introduces new tools for generalization bounds that provide much tighter sample complexity guarantees than the typical VC-dimension or norm-based bounds, which may be of independent interest.
【18】 Dynamic Programming and Linear Programming for Odds Problem 标题:求解赔率问题的动态规划和线性规划
作者:Sachika Kurokawa,Tomomi Matsui 备注:12 pages, 1 figure 链接:https://arxiv.org/abs/2107.13146 摘要:本文讨论了Bruss在2000年提出的赔率问题及其变体。使用一种称为动态规划(DP)方程的递推关系来寻找赔率问题及其变量的最优停止策略。2013年,Buchbinder、Jain和Singh提出了一个线性规划(LP)公式,用于寻找经典秘书问题的最优停止策略,这是赔率问题的一个特例。所提出的线性规划问题使获胜概率最大化,不同于长期已知的DP方程。本文证明了一个普通的DP方程是对线性规划对偶问题的修正,包括Buchbinder、Jain和Singh提出的LP公式。 摘要:This paper discusses the odds problem, proposed by Bruss in 2000, and its variants. A recurrence relation called a dynamic programming (DP) equation is used to find an optimal stopping policy of the odds problem and its variants. In 2013, Buchbinder, Jain, and Singh proposed a linear programming (LP) formulation for finding an optimal stopping policy of the classical secretary problem, which is a special case of the odds problem. The proposed linear programming problem, which maximizes the probability of a win, differs from the DP equations known for long time periods. This paper shows that an ordinary DP equation is a modification of the dual problem of linear programming including the LP formulation proposed by Buchbinder, Jain, and Singh.
【19】 Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games 标题:策略梯度法求解N人一般和线性二次对策的纳什均衡
作者:Ben Hambly,Renyuan Xu,Huining Yang 链接:https://arxiv.org/abs/2107.13090 摘要:在有限区间上,我们考虑了一个具有随机动态的广义和n线性二次对策,证明了自然策略梯度法在纳什均衡中的全局收敛性。为了证明该方法的收敛性,我们需要在系统中加入一定量的噪声。为了保证收敛性,我们给出了一个条件,即模型参数的噪声协方差的一个下界。我们用数值实验来说明我们的结果,以表明即使在策略梯度方法在确定性环境中可能不收敛的情况下,添加噪声也会导致收敛。 摘要:We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.
【20】 End-to-End Balancing for Causal Continuous Treatment-Effect Estimation 标题:用于因果连续治疗效果评估的端到端平衡
作者:Mohammad Taha Bahadori,Eric Tchetgen Tchetgen,David E. Heckerman 机构:Amazon 链接:https://arxiv.org/abs/2107.13068 摘要:我们研究了连续处理的观察因果推理问题。我们的重点是评估不经常观察到的治疗值的因果反应曲线的挑战。我们设计了一种基于熵平衡框架的新算法,该算法通过端到端优化学习权重,直接最大化因果推理精度。我们的权重可以针对不同的数据集和因果推理算法进行定制。我们提出了一种新的连续处理熵平衡一致性理论。使用合成和真实数据,我们证明了我们提出的算法在因果推理准确性方面优于熵平衡。 摘要:We study the problem of observational causal inference with continuous treatment. We focus on the challenge of estimating the causal response curve for infrequently-observed treatment values. We design a new algorithm based on the framework of entropy balancing which learns weights that directly maximize causal inference accuracy using end-to-end optimization. Our weights can be customized for different datasets and causal inference algorithms. We propose a new theory for consistency of entropy balancing for continuous treatments. Using synthetic and real-world data, we show that our proposed algorithm outperforms the entropy balancing in terms of causal inference accuracy.
【21】 Explicit Pairwise Factorized Graph Neural Network for Semi-Supervised Node Classification 标题:半监督节点分类的显式两两因式分解图神经网络
作者:Yu Wang,Yuesong Shen,Daniel Cremers 机构:Technical University of Munich, Germany, University of Illinois at Chicago, USA 链接:https://arxiv.org/abs/2107.13059 摘要:节点特征和图的结构信息是半监督节点分类问题的关键。各种基于图形神经网络(GNN)的方法被提出来解决这些问题,这些方法通常通过特征聚合来确定输出标签。这可能会有问题,因为它意味着给定隐藏表示的输出节点的条件独立性,尽管它们在图中有直接连接。为了了解图中输出节点之间的直接影响,我们提出了显式成对分解图神经网络(EPFGNN),该网络将整个图建模为部分观测的马尔可夫随机场。它包含显式的成对因子来模拟输出-输出关系,并使用GNN主干来模拟输入-输出关系。为了平衡模型的复杂性和表达能力,成对因子有一个共享分量,每个边有一个单独的比例系数。我们应用EM算法来训练我们的模型,并利用星形分段似然法来处理可处理的代理目标。我们在各种数据集上进行了实验,结果表明我们的模型可以有效地提高图上半监督节点分类的性能。 摘要:Node features and structural information of a graph are both crucial for semi-supervised node classification problems. A variety of graph neural network (GNN) based approaches have been proposed to tackle these problems, which typically determine output labels through feature aggregation. This can be problematic, as it implies conditional independence of output nodes given hidden representations, despite their direct connections in the graph. To learn the direct influence among output nodes in a graph, we propose the Explicit Pairwise Factorized Graph Neural Network (EPFGNN), which models the whole graph as a partially observed Markov Random Field. It contains explicit pairwise factors to model output-output relations and uses a GNN backbone to model input-output relations. To balance model complexity and expressivity, the pairwise factors have a shared component and a separate scaling coefficient for each edge. We apply the EM algorithm to train our model, and utilize a star-shaped piecewise likelihood for the tractable surrogate objective. We conduct experiments on various datasets, which shows that our model can effectively improve the performance for semi-supervised node classification on graphs.
【22】 Hierarchical Associative Memory 标题:分层联想存储器
作者:Dmitry Krotov 机构:MIT-IBM Watson AI Lab, IBM Research 链接:https://arxiv.org/abs/2107.06446 摘要:密集联想记忆或现代Hopfield网络具有联想记忆的许多吸引人的特性。它们可以完成模式补全,存储大量记忆,并且可以使用具有一定生物学合理性和神经元之间丰富反馈的递归神经网络进行描述。同时,到目前为止,这类模型只有一个隐藏层,并且只有密集连接的网络体系结构,这两个方面阻碍了它们的机器学习应用。该模型描述了一个大的卷积间隙和一个大的卷积间隙的动态轨迹,它可以减少一个卷积间隙的能量。整个网络的记忆是使用编码在较低层的突触权重中的原语动态“组装”的,而“组装规则”编码在较高层的突触权重中。除了通常使用的前馈神经网络所特有的自下而上的信息传播外,所描述的模型还具有来自高层的丰富的自上而下反馈,帮助下层神经元决定其对输入刺激的响应。 摘要:Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer, and have only been formulated with densely connected network architectures, two aspects that hinder their machine learning applications. This paper tackles this gap and describes a fully recurrent model of associative memory with an arbitrary large number of layers, some of which can be locally connected (convolutional), and a corresponding energy function that decreases on the dynamical trajectory of the neurons' activations. The memories of the full network are dynamically "assembled" using primitives encoded in the synaptic weights of the lower layers, with the "assembling rules" encoded in the synaptic weights of the higher layers. In addition to the bottom-up propagation of information, typical of commonly used feedforward neural networks, the model described has rich top-down feedback from higher layers that help the lower-layer neurons to decide on their response to the input stimuli.