Kaggle竞赛：《NIPS 2017 Adversarial Learning Challenges》

fishexpert

发布于 2018-11-21 15:41:19

6060

文章被收录于专栏：深度学习与数据挖掘实战深度学习与数据挖掘实战

『Kaggle精选』

《Getting Started with the NIPS 2017 Adversarial Learning Challenges》链接：https://www.kaggle.com/benhamner/adversarial-learning-challenges-getting-started

『深度学习tips』

《Why does batch normalization help?》

From Quora by Derek Chen, works at Notability

Batch normalization potentially helps in two ways: faster learning and higher overall accuracy. The improved method also allows you to use a higher learning rate, potentially providing another boost in speed.

Why does this work? Well, we know that normalization (shifting inputs to zero-mean and unit variance) is often used as a pre-processing step（http://ufldl.stanford.edu/wiki/index.php/Data_Preprocessing#Data_Normalization） to make the data comparable across features. As the data flows through a deep network, the weights and parameters adjust those values, sometimes making the data too big or too small again - a problem the authors refer to as "internal covariate shift". By normalizing the data in each mini-batch, this problem is largely avoided.

Basically, rather than just performing normalization once in the beginning, you're doing it all over place. Of course, this is a drastically simplified view of the matter (since for one thing, I'm completely ignoring the post-processing updates applied to the entire network), but hopefully this gives a good high-level overview.

Update: For a more detailed breakdown of gradient calculations, check out: Understanding the backward pass through Batch Normalization Layer（http://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html）

From Quora by Shuaib Ahmed S, PhD, Machine learning

Naturally, neural networks including deep networks require careful tuning of weight initialization and learning parameters. Batch normalization helps relaxing them a little.

Weights problem:

Whatever the initialization of weights, be it random or empirically chosen, they are far away from the learned weights. Consider a mini-batch, during initial epochs, there will be many outliers (far away weights from required) in terms of required feature activations.
The (deep) neural network by itself is ill-posed, i.e. a small perturbation in the initial layers, leads to a large change in the later layers.

During back propagation, these phenomena causes distraction to gradients, meaning the gradients have to compensate the outliers, before learning the weights to produce required outputs. This leads to the requirement of extra epochs to converge.

Batch normalization regularizes these gradient from distraction to outliers and flow towards the common goal (by normalizing them) within a range of the mini-batch. Resulting in acceleration of the learning process.

Learning rate problem:

Generally, learning rates are kept small, such that only a small portion of gradients corrects the weights, the reason is that the gradients for outlier activations should not affect learned activations. By batch normalization, these outlier activations are reduced and hence higher learning rates can be used to accelerate the learning process.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2017-07-14，如有侵权请联系 cloudcommunity@tencent.com 删除

其他

本文分享自深度学习与数据挖掘实战微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

其他

登录后参与评论

0 条评论

热度

Kaggle竞赛：《NIPS 2017 Adversarial Learning Challenges》

Kaggle竞赛：《NIPS 2017 Adversarial Learning Challenges》

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐