Update to transforms docs (#3646)

NicolasHug · fmassa · web-flow · commit 3a278d701d3a · 2021-04-08T10:41:21.000+02:00
* Fixed return docstrings

* Added some refs and corrected some parts

* more refs, and a note about dtypes

Co-authored-by: Francisco Massa &lt;fvsmassa@gmail.com&gt;
diff --git a/docs/source/transforms.rst b/docs/source/transforms.rst
@@ -4,15 +4,34 @@ torchvision.transforms
 .. currentmodule:: torchvision.transforms
 
 Transforms are common image transformations. They can be chained together using :class:`Compose`.
-Additionally, there is the :mod:`torchvision.transforms.functional` module.
-Functional transforms give fine-grained control over the transformations.
+Most transform classes have a function equivalent: :ref:`functional
+transforms <functional_transforms>` give fine-grained control over the
+transformations.
 This is useful if you have to build a more complex transformation pipeline
 (e.g. in the case of segmentation tasks).
 
-All transformations accept PIL Image, Tensor Image or batch of Tensor Images as input. Tensor Image is a tensor with
-``(C, H, W)`` shape, where ``C`` is a number of channels, ``H`` and ``W`` are image height and width. Batch of
-Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number of images in the batch. Deterministic or
-random transformations applied on the batch of Tensor Images identically transform all the images of the batch.
+Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
+images and tensor images, although some transformations are :ref:`PIL-only
+<transforms_pil_only>` and some are :ref:`tensor-only
+<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
+convert to and from PIL images.
+
+The transformations that accept tensor images also accept batches of tensor
+images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
+number of channels, ``H`` and ``W`` are image height and width. A batch of
+Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
+of images in the batch.
+
+The expected range of the values of a tensor image is implicitely defined by
+the tensor dtype. Tensor images with a float dtype are expected to have
+values in ``[0, 1)``. Tensor images with an integer dtype are expected to
+have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
+that can be represented in that dtype.
+
+Randomized transformations will apply the same transformation to all the
+images of a given batch, but they will produce different transformations
+across calls. For reproducible transformations across calls, you may use
+:ref:`functional transforms <functional_transforms>`.
 
 .. warning::
 
@@ -117,13 +136,16 @@ Transforms on PIL Image and torch.\*Tensor
 .. autoclass:: GaussianBlur
     :members:
 
+.. _transforms_pil_only:
+
 Transforms on PIL Image only
 ----------------------------
 
 .. autoclass:: RandomChoice
 
 .. autoclass:: RandomOrder
 
+.. _transforms_tensor_only:
 
 Transforms on torch.\*Tensor only
 ---------------------------------
@@ -139,6 +161,7 @@ Transforms on torch.\*Tensor only
 
 .. autoclass:: ConvertImageDtype
 
+.. _conversion_transforms:
 
 Conversion Transforms
 ---------------------
@@ -173,13 +196,16 @@ The new transform can be used standalone or mixed-and-matched with existing tran
     :members:
 
 
+.. _functional_transforms:
+
 Functional Transforms
 ---------------------
 
 Functional transforms give you fine-grained control of the transformation pipeline.
 As opposed to the transformations above, functional transforms don't contain a random number
 generator for their parameters.
-That means you have to specify/generate all parameters, but you can reuse the functional transform.
+That means you have to specify/generate all parameters, but the functional transform will give you
+reproducible results across calls.
 
 Example:
 you can apply a functional transform with the same parameters to multiple images like this:
diff --git a/torchvision/transforms/functional.py b/torchvision/transforms/functional.py
@@ -671,7 +671,7 @@ def five_crop(img: Tensor, size: List[int]) -> Tuple[Tensor, Tensor, Tensor, Ten
 
     Returns:
        tuple: tuple (tl, tr, bl, br, center)
-                Corresponding top left, top right, bottom left, bottom right and center crop.
+       Corresponding top left, top right, bottom left, bottom right and center crop.
     """
     if isinstance(size, numbers.Number):
         size = (int(size), int(size))
@@ -717,8 +717,8 @@ def ten_crop(img: Tensor, size: List[int], vertical_flip: bool = False) -> List[
 
     Returns:
         tuple: tuple (tl, tr, bl, br, center, tl_flip, tr_flip, bl_flip, br_flip, center_flip)
-            Corresponding top left, top right, bottom left, bottom right and
-            center crop and same for the flipped image.
+        Corresponding top left, top right, bottom left, bottom right and
+        center crop and same for the flipped image.
     """
     if isinstance(size, numbers.Number):
         size = (int(size), int(size))
@@ -1103,9 +1103,9 @@ def to_grayscale(img, num_output_channels=1):
 
     Returns:
         PIL Image: Grayscale version of the image.
-            if num_output_channels = 1 : returned image is single channel
 
-            if num_output_channels = 3 : returned image is 3 channel with r = g = b
+        - if num_output_channels = 1 : returned image is single channel
+        - if num_output_channels = 3 : returned image is 3 channel with r = g = b
     """
     if isinstance(img, Image.Image):
         return F_pil.to_grayscale(img, num_output_channels)
@@ -1128,9 +1128,9 @@ def rgb_to_grayscale(img: Tensor, num_output_channels: int = 1) -> Tensor:
 
     Returns:
         PIL Image or Tensor: Grayscale version of the image.
-            if num_output_channels = 1 : returned image is single channel
 
-            if num_output_channels = 3 : returned image is 3 channel with r = g = b
+        - if num_output_channels = 1 : returned image is single channel
+        - if num_output_channels = 3 : returned image is 3 channel with r = g = b
     """
     if not isinstance(img, torch.Tensor):
         return F_pil.to_grayscale(img, num_output_channels)
@@ -1330,6 +1330,7 @@ def equalize(img: Tensor) -> Tensor:
         img (PIL Image or Tensor): Image on which equalize is applied.
             If img is torch Tensor, it is expected to be in [..., 1 or 3, H, W] format,
             where ... means it can have an arbitrary number of leading dimensions.
+            The tensor dtype must be ``torch.uint8`` and values are expected to be in ``[0, 255]``.
             If img is PIL Image, it is expected to be in mode "P", "L" or "RGB".
 
     Returns:
diff --git a/torchvision/transforms/transforms.py b/torchvision/transforms/transforms.py
@@ -841,7 +841,7 @@ def get_params(
 
         Returns:
             tuple: params (i, j, h, w) to be passed to ``crop`` for a random
-                sized crop.
+            sized crop.
         """
         width, height = F._get_image_size(img)
         area = height * width
@@ -1464,8 +1464,9 @@ class Grayscale(torch.nn.Module):
 
     Returns:
         PIL Image: Grayscale version of the input.
-         - If ``num_output_channels == 1`` : returned image is single channel
-         - If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b
+
+        - If ``num_output_channels == 1`` : returned image is single channel
+        - If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b
 
     """