What are the numbers in torch.transforms.normalize and how to select them?

Written by- Aionlinecourse1075 times views


Torch.transforms.normalize is a module that contains functions for normalizing data.

Normalization is the process of adjusting an observation to have a mean of 0 and variance of 1.

The norm() function takes the input of three parameters, namely, the mean, standard deviation, and number of observations. The first two parameters are required while the third parameter is optional.

The numbers in torch.transforms.normalize are different for different types of normalization algorithms. For instance, if you want to use L1 norm then you need to set the mean parameter to be 0 and standard deviation parameter to be 1 while if you want to use L2 norm then you need to set the mean parameter to be 0 and standard deviation parameter to be square root(N)

What are the numbers in torch.transforms.normalize and how to select them?

Normalize in pytorch context subtracts from each instance (MNIST image in your case) the mean (the first number) and divides by the standard deviation (second number). This takes place for each channel seperately, meaning in mnist you only need 2 numbers because images are grayscale, but on let's say cifar10 which has colored images you would use something along the lines of your last sform (3 numbers for mean and 3 for std).

So basically each input image in MNIST gets transformed from [0,255] to [0,1] because you transform an image to Tensor (source: https://pytorch.org/docs/stable/torchvision/transforms.html -- Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8)

After that you want your input image to have values in a range like [0,1] or [-1,1] to help your model converge to the right direction (Many reasons why scaling takes place, e.g. NNs prefer inputs around that range to avoid gradient saturation). Now as you probably noticed passing 0.5 and 0.5 in Normalize would yield vales in range:

Min of input image = 0 -> 0-0.5 = -0.5 -> gets divided by 0.5 std -> -1

Max of input image = 255 -> toTensor -> 1 -> (1 - 0.5) / 0.5 -> 1

so it transforms your data in a range [-1, 1]


Thank you for reading the article.