Skip to content

Commit 3a278d7

Browse files
NicolasHugfmassa
andauthored
Update to transforms docs (#3646)
* Fixed return docstrings * Added some refs and corrected some parts * more refs, and a note about dtypes Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
1 parent 7f4ae8c commit 3a278d7

File tree

3 files changed

+45
-17
lines changed

3 files changed

+45
-17
lines changed

docs/source/transforms.rst

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,34 @@ torchvision.transforms
44
.. currentmodule:: torchvision.transforms
55

66
Transforms are common image transformations. They can be chained together using :class:`Compose`.
7-
Additionally, there is the :mod:`torchvision.transforms.functional` module.
8-
Functional transforms give fine-grained control over the transformations.
7+
Most transform classes have a function equivalent: :ref:`functional
8+
transforms <functional_transforms>` give fine-grained control over the
9+
transformations.
910
This is useful if you have to build a more complex transformation pipeline
1011
(e.g. in the case of segmentation tasks).
1112

12-
All transformations accept PIL Image, Tensor Image or batch of Tensor Images as input. Tensor Image is a tensor with
13-
``(C, H, W)`` shape, where ``C`` is a number of channels, ``H`` and ``W`` are image height and width. Batch of
14-
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number of images in the batch. Deterministic or
15-
random transformations applied on the batch of Tensor Images identically transform all the images of the batch.
13+
Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
14+
images and tensor images, although some transformations are :ref:`PIL-only
15+
<transforms_pil_only>` and some are :ref:`tensor-only
16+
<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
17+
convert to and from PIL images.
18+
19+
The transformations that accept tensor images also accept batches of tensor
20+
images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
21+
number of channels, ``H`` and ``W`` are image height and width. A batch of
22+
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
23+
of images in the batch.
24+
25+
The expected range of the values of a tensor image is implicitely defined by
26+
the tensor dtype. Tensor images with a float dtype are expected to have
27+
values in ``[0, 1)``. Tensor images with an integer dtype are expected to
28+
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
29+
that can be represented in that dtype.
30+
31+
Randomized transformations will apply the same transformation to all the
32+
images of a given batch, but they will produce different transformations
33+
across calls. For reproducible transformations across calls, you may use
34+
:ref:`functional transforms <functional_transforms>`.
1635

1736
.. warning::
1837

@@ -117,13 +136,16 @@ Transforms on PIL Image and torch.\*Tensor
117136
.. autoclass:: GaussianBlur
118137
:members:
119138

139+
.. _transforms_pil_only:
140+
120141
Transforms on PIL Image only
121142
----------------------------
122143

123144
.. autoclass:: RandomChoice
124145

125146
.. autoclass:: RandomOrder
126147

148+
.. _transforms_tensor_only:
127149

128150
Transforms on torch.\*Tensor only
129151
---------------------------------
@@ -139,6 +161,7 @@ Transforms on torch.\*Tensor only
139161

140162
.. autoclass:: ConvertImageDtype
141163

164+
.. _conversion_transforms:
142165

143166
Conversion Transforms
144167
---------------------
@@ -173,13 +196,16 @@ The new transform can be used standalone or mixed-and-matched with existing tran
173196
:members:
174197

175198

199+
.. _functional_transforms:
200+
176201
Functional Transforms
177202
---------------------
178203

179204
Functional transforms give you fine-grained control of the transformation pipeline.
180205
As opposed to the transformations above, functional transforms don't contain a random number
181206
generator for their parameters.
182-
That means you have to specify/generate all parameters, but you can reuse the functional transform.
207+
That means you have to specify/generate all parameters, but the functional transform will give you
208+
reproducible results across calls.
183209

184210
Example:
185211
you can apply a functional transform with the same parameters to multiple images like this:

torchvision/transforms/functional.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -671,7 +671,7 @@ def five_crop(img: Tensor, size: List[int]) -> Tuple[Tensor, Tensor, Tensor, Ten
671671
672672
Returns:
673673
tuple: tuple (tl, tr, bl, br, center)
674-
Corresponding top left, top right, bottom left, bottom right and center crop.
674+
Corresponding top left, top right, bottom left, bottom right and center crop.
675675
"""
676676
if isinstance(size, numbers.Number):
677677
size = (int(size), int(size))
@@ -717,8 +717,8 @@ def ten_crop(img: Tensor, size: List[int], vertical_flip: bool = False) -> List[
717717
718718
Returns:
719719
tuple: tuple (tl, tr, bl, br, center, tl_flip, tr_flip, bl_flip, br_flip, center_flip)
720-
Corresponding top left, top right, bottom left, bottom right and
721-
center crop and same for the flipped image.
720+
Corresponding top left, top right, bottom left, bottom right and
721+
center crop and same for the flipped image.
722722
"""
723723
if isinstance(size, numbers.Number):
724724
size = (int(size), int(size))
@@ -1103,9 +1103,9 @@ def to_grayscale(img, num_output_channels=1):
11031103
11041104
Returns:
11051105
PIL Image: Grayscale version of the image.
1106-
if num_output_channels = 1 : returned image is single channel
11071106
1108-
if num_output_channels = 3 : returned image is 3 channel with r = g = b
1107+
- if num_output_channels = 1 : returned image is single channel
1108+
- if num_output_channels = 3 : returned image is 3 channel with r = g = b
11091109
"""
11101110
if isinstance(img, Image.Image):
11111111
return F_pil.to_grayscale(img, num_output_channels)
@@ -1128,9 +1128,9 @@ def rgb_to_grayscale(img: Tensor, num_output_channels: int = 1) -> Tensor:
11281128
11291129
Returns:
11301130
PIL Image or Tensor: Grayscale version of the image.
1131-
if num_output_channels = 1 : returned image is single channel
11321131
1133-
if num_output_channels = 3 : returned image is 3 channel with r = g = b
1132+
- if num_output_channels = 1 : returned image is single channel
1133+
- if num_output_channels = 3 : returned image is 3 channel with r = g = b
11341134
"""
11351135
if not isinstance(img, torch.Tensor):
11361136
return F_pil.to_grayscale(img, num_output_channels)
@@ -1330,6 +1330,7 @@ def equalize(img: Tensor) -> Tensor:
13301330
img (PIL Image or Tensor): Image on which equalize is applied.
13311331
If img is torch Tensor, it is expected to be in [..., 1 or 3, H, W] format,
13321332
where ... means it can have an arbitrary number of leading dimensions.
1333+
The tensor dtype must be ``torch.uint8`` and values are expected to be in ``[0, 255]``.
13331334
If img is PIL Image, it is expected to be in mode "P", "L" or "RGB".
13341335
13351336
Returns:

torchvision/transforms/transforms.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -841,7 +841,7 @@ def get_params(
841841
842842
Returns:
843843
tuple: params (i, j, h, w) to be passed to ``crop`` for a random
844-
sized crop.
844+
sized crop.
845845
"""
846846
width, height = F._get_image_size(img)
847847
area = height * width
@@ -1464,8 +1464,9 @@ class Grayscale(torch.nn.Module):
14641464
14651465
Returns:
14661466
PIL Image: Grayscale version of the input.
1467-
- If ``num_output_channels == 1`` : returned image is single channel
1468-
- If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b
1467+
1468+
- If ``num_output_channels == 1`` : returned image is single channel
1469+
- If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b
14691470
14701471
"""
14711472

0 commit comments

Comments
 (0)