Description
🚀 Feature
Replace the channels
parameter of the read_image()
, decode_image()
, decode_png()
and decode_jpeg()
methods with a mode
parameter that gives users better control on image conversions.
Motivation
Issue #2948 proposed specifying the number of output channels when loading images and PR #2988 introduced the change on the above methods. Though the API provides similar functionality to other libraries, it requires making implicit assumptions in relation to the mapping between # channels and output formats. For example, we assumed that channel=1
means Grayscale. Nevertheless since both Palette images and Gayscale images use 1 channel, we had to introduce logic for handling such corner-cases.
A better approach would be to give control to the users to explicitly define what type of conversions they want to make.
Pitch
Building on top of @vfdev-5's proposal, we could replace channels
with a mode
parameter. For example:
# old:
def read_image(path: str, channels: int = 0) -> torch.Tensor
# new:
def read_image(path: str, mode: ImageReadMode = ImageReadMode.UNCHANGED) -> torch.Tensor
The mode
will be an enum with the following values:
ImageReadMode.UNCHANGED
ImageReadMode.GRAY
ImageReadMode.GRAY_ALPHA
ImageReadMode.RGB
ImageReadMode.RGB_ALPHA
The default value of mode
will be ImageReadMode.UNCHANGED and it will have similar behaviour as the current channels=0
. It will load the image without making any modification and to ensure BC it will additionally support Palette, CMYK and other currently supported formats.
Note: The scope of this proposal is to change this experimental API to allow for better image format support on the future. Adding support for converting images to Palette, from/to CMYK, etc is not within the scope of this proposal. Many of such conversions are not supported by LibJPEG and LibPNG and require writing custom conversion code which should be handled in a separate ticket.