Saying that layer normalization normalizes input across the features was difficult for me to understand initially. Here's what made it click for me - straight from the Layer Normalization paper abstract:

"In this paper, we transpose batch normalization into layer normalization by **computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case**. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times."

Alex Bussan

Machine Learning Engineer. Interested in how ML/AI can be applied to both business and art.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store