Skip to content

Commit 2d8a288

Browse files
authored
[Cherry-Pick for 0.20] Revamp decoding docs (#8633) (#8666)
1 parent 7f4d561 commit 2d8a288

File tree

4 files changed

+131
-88
lines changed

4 files changed

+131
-88
lines changed

docs/source/io.rst

+61-46
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,46 @@ Decoding / Encoding images and videos
33

44
.. currentmodule:: torchvision.io
55

6-
The :mod:`torchvision.io` package provides functions for performing IO
7-
operations. They are currently specific to reading and writing images and
8-
videos.
6+
The :mod:`torchvision.io` module provides utilities for decoding and encoding
7+
images and videos.
98

10-
Images
11-
------
9+
Image Decoding
10+
--------------
1211

1312
Torchvision currently supports decoding JPEG, PNG, WEBP and GIF images. JPEG
1413
decoding can also be done on CUDA GPUs.
1514

16-
For encoding, JPEG (cpu and CUDA) and PNG are supported.
15+
The main entry point is the :func:`~torchvision.io.decode_image` function, which
16+
you can use as an alternative to ``PIL.Image.open()``. It will decode images
17+
straight into image Tensors, thus saving you the conversion and allowing you to
18+
run transforms/preproc natively on tensors.
19+
20+
.. code::
21+
22+
from torchvision.io import decode_image
23+
24+
img = decode_image("path_to_image", mode="RGB")
25+
img.dtype # torch.uint8
26+
27+
# Or
28+
raw_encoded_bytes = ... # read encoded bytes from your file system
29+
img = decode_image(raw_encoded_bytes, mode="RGB")
30+
31+
32+
:func:`~torchvision.io.decode_image` will automatically detect the image format,
33+
and call the corresponding decoder. You can also use the lower-level
34+
format-specific decoders which can be more powerful, e.g. if you want to
35+
encode/decode JPEGs on CUDA.
1736

1837
.. autosummary::
1938
:toctree: generated/
2039
:template: function.rst
2140

2241
decode_image
23-
encode_jpeg
2442
decode_jpeg
25-
write_jpeg
43+
encode_png
2644
decode_gif
2745
decode_webp
28-
encode_png
29-
decode_png
30-
write_png
31-
read_file
32-
write_file
3346

3447
.. autosummary::
3548
:toctree: generated/
@@ -41,14 +54,47 @@ Obsolete decoding function:
4154

4255
.. autosummary::
4356
:toctree: generated/
44-
:template: class.rst
57+
:template: function.rst
4558

4659
read_image
4760

61+
Image Encoding
62+
--------------
63+
64+
For encoding, JPEG (cpu and CUDA) and PNG are supported.
65+
66+
67+
.. autosummary::
68+
:toctree: generated/
69+
:template: function.rst
70+
71+
encode_jpeg
72+
write_jpeg
73+
encode_png
74+
write_png
75+
76+
IO operations
77+
-------------
78+
79+
.. autosummary::
80+
:toctree: generated/
81+
:template: function.rst
82+
83+
read_file
84+
write_file
4885

4986
Video
5087
-----
5188

89+
.. warning::
90+
91+
Torchvision supports video decoding through different APIs listed below,
92+
some of which are still in BETA stage. In the near future, we intend to
93+
centralize PyTorch's video decoding capabilities within the `torchcodec
94+
<https://github.com/pytorch/torchcodec>`_ project. We encourage you to try
95+
it out and share your feedback, as the torchvision video decoders will
96+
eventually be deprecated.
97+
5298
.. autosummary::
5399
:toctree: generated/
54100
:template: function.rst
@@ -58,45 +104,14 @@ Video
58104
write_video
59105

60106

61-
Fine-grained video API
62-
^^^^^^^^^^^^^^^^^^^^^^
107+
**Fine-grained video API**
63108

64109
In addition to the :mod:`read_video` function, we provide a high-performance
65110
lower-level API for more fine-grained control compared to the :mod:`read_video` function.
66111
It does all this whilst fully supporting torchscript.
67112

68-
.. betastatus:: fine-grained video API
69-
70113
.. autosummary::
71114
:toctree: generated/
72115
:template: class.rst
73116

74117
VideoReader
75-
76-
77-
Example of inspecting a video:
78-
79-
.. code:: python
80-
81-
import torchvision
82-
video_path = "path to a test video"
83-
# Constructor allocates memory and a threaded decoder
84-
# instance per video. At the moment it takes two arguments:
85-
# path to the video file, and a wanted stream.
86-
reader = torchvision.io.VideoReader(video_path, "video")
87-
88-
# The information about the video can be retrieved using the
89-
# `get_metadata()` method. It returns a dictionary for every stream, with
90-
# duration and other relevant metadata (often frame rate)
91-
reader_md = reader.get_metadata()
92-
93-
# metadata is structured as a dict of dicts with following structure
94-
# {"stream_type": {"attribute": [attribute per stream]}}
95-
#
96-
# following would print out the list of frame rates for every present video stream
97-
print(reader_md["video"]["fps"])
98-
99-
# we explicitly select the stream we would like to operate on. In
100-
# the constructor we select a default video stream, but
101-
# in practice, we can set whichever stream we would like
102-
video.set_current_stream("video:0")

torchvision/io/image.py

+38-42
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,25 @@
2020

2121

2222
class ImageReadMode(Enum):
23-
"""
24-
Support for various modes while reading images.
23+
"""Allow automatic conversion to RGB, RGBA, etc while decoding.
24+
25+
.. note::
26+
27+
You don't need to use this struct, you can just pass strings to all
28+
``mode`` parameters, e.g. ``mode="RGB"``.
2529
26-
Use ``ImageReadMode.UNCHANGED`` for loading the image as-is,
27-
``ImageReadMode.GRAY`` for converting to grayscale,
28-
``ImageReadMode.GRAY_ALPHA`` for grayscale with transparency,
29-
``ImageReadMode.RGB`` for RGB and ``ImageReadMode.RGB_ALPHA`` for
30-
RGB with transparency.
30+
The different available modes are the following.
31+
32+
- UNCHANGED: loads the image as-is
33+
- RGB: converts to RGB
34+
- RGBA: converts to RGB with transparency (also aliased as RGB_ALPHA)
35+
- GRAY: converts to grayscale
36+
- GRAY_ALPHA: converts to grayscale with transparency
3137
3238
.. note::
3339
34-
Some decoders won't support all possible values, e.g. a decoder may only
35-
support "RGB" and "RGBA" mode.
40+
Some decoders won't support all possible values, e.g. GRAY and
41+
GRAY_ALPHA are only supported for PNG and JPEG images.
3642
"""
3743

3844
UNCHANGED = 0
@@ -45,8 +51,7 @@ class ImageReadMode(Enum):
4551

4652
def read_file(path: str) -> torch.Tensor:
4753
"""
48-
Reads and outputs the bytes contents of a file as a uint8 Tensor
49-
with one dimension.
54+
Return the bytes contents of a file as a uint8 1D Tensor.
5055
5156
Args:
5257
path (str or ``pathlib.Path``): the path to the file to be read
@@ -62,8 +67,7 @@ def read_file(path: str) -> torch.Tensor:
6267

6368
def write_file(filename: str, data: torch.Tensor) -> None:
6469
"""
65-
Writes the contents of an uint8 tensor with one dimension to a
66-
file.
70+
Write the content of an uint8 1D tensor to a file.
6771
6872
Args:
6973
filename (str or ``pathlib.Path``): the path to the file to be written
@@ -93,10 +97,9 @@ def decode_png(
9397
Args:
9498
input (Tensor[1]): a one dimensional uint8 tensor containing
9599
the raw bytes of the PNG image.
96-
mode (str or ImageReadMode): the read mode used for optionally
97-
converting the image. Default: ``ImageReadMode.UNCHANGED``.
98-
See `ImageReadMode` class for more information on various
99-
available modes.
100+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
101+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
102+
for available modes.
100103
apply_exif_orientation (bool): apply EXIF orientation transformation to the output tensor.
101104
Default: False.
102105
@@ -156,8 +159,7 @@ def decode_jpeg(
156159
device: Union[str, torch.device] = "cpu",
157160
apply_exif_orientation: bool = False,
158161
) -> Union[torch.Tensor, List[torch.Tensor]]:
159-
"""
160-
Decode JPEG image(s) into 3 dimensional RGB or grayscale Tensor(s).
162+
"""Decode JPEG image(s) into 3D RGB or grayscale Tensor(s), on CPU or CUDA.
161163
162164
The values of the output tensor are uint8 between 0 and 255.
163165
@@ -171,12 +173,9 @@ def decode_jpeg(
171173
input (Tensor[1] or list[Tensor[1]]): a (list of) one dimensional uint8 tensor(s) containing
172174
the raw bytes of the JPEG image. The tensor(s) must be on CPU,
173175
regardless of the ``device`` parameter.
174-
mode (str or ImageReadMode): the read mode used for optionally
175-
converting the image(s). The supported modes are: ``ImageReadMode.UNCHANGED``,
176-
``ImageReadMode.GRAY`` and ``ImageReadMode.RGB``
177-
Default: ``ImageReadMode.UNCHANGED``.
178-
See ``ImageReadMode`` class for more information on various
179-
available modes.
176+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
177+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
178+
for available modes.
180179
device (str or torch.device): The device on which the decoded image will
181180
be stored. If a cuda device is specified, the image will be decoded
182181
with `nvjpeg <https://developer.nvidia.com/nvjpeg>`_. This is only
@@ -228,9 +227,7 @@ def decode_jpeg(
228227
def encode_jpeg(
229228
input: Union[torch.Tensor, List[torch.Tensor]], quality: int = 75
230229
) -> Union[torch.Tensor, List[torch.Tensor]]:
231-
"""
232-
Takes a (list of) input tensor(s) in CHW layout and returns a (list of) buffer(s) with the contents
233-
of the corresponding JPEG file(s).
230+
"""Encode RGB tensor(s) into raw encoded jpeg bytes, on CPU or CUDA.
234231
235232
.. note::
236233
Passing a list of CUDA tensors is more efficient than repeated individual calls to ``encode_jpeg``.
@@ -286,7 +283,7 @@ def decode_image(
286283
mode: ImageReadMode = ImageReadMode.UNCHANGED,
287284
apply_exif_orientation: bool = False,
288285
) -> torch.Tensor:
289-
"""Decode an image into a tensor.
286+
"""Decode an image into a uint8 tensor, from a path or from raw encoded bytes.
290287
291288
Currently supported image formats are jpeg, png, gif and webp.
292289
@@ -303,10 +300,9 @@ def decode_image(
303300
input (Tensor or str or ``pathlib.Path``): The image to decode. If a
304301
tensor is passed, it must be one dimensional uint8 tensor containing
305302
the raw bytes of the image. Otherwise, this must be a path to the image file.
306-
mode (str or ImageReadMode): the read mode used for optionally converting the image.
307-
Default: ``ImageReadMode.UNCHANGED``.
308-
See ``ImageReadMode`` class for more information on various
309-
available modes. Only applies to JPEG and PNG images.
303+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
304+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
305+
for available modes.
310306
apply_exif_orientation (bool): apply EXIF orientation transformation to the output tensor.
311307
Only applies to JPEG and PNG images. Default: False.
312308
@@ -367,9 +363,9 @@ def decode_webp(
367363
Args:
368364
input (Tensor[1]): a one dimensional contiguous uint8 tensor containing
369365
the raw bytes of the WEBP image.
370-
mode (str or ImageReadMode): The read mode used for optionally
371-
converting the image color space. Default: ``ImageReadMode.UNCHANGED``.
372-
Other supported values are ``ImageReadMode.RGB`` and ``ImageReadMode.RGB_ALPHA``.
366+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
367+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
368+
for available modes.
373369
374370
Returns:
375371
Decoded image (Tensor[image_channels, image_height, image_width])
@@ -398,9 +394,9 @@ def _decode_avif(
398394
Args:
399395
input (Tensor[1]): a one dimensional contiguous uint8 tensor containing
400396
the raw bytes of the AVIF image.
401-
mode (str or ImageReadMode): The read mode used for optionally
402-
converting the image color space. Default: ``ImageReadMode.UNCHANGED``.
403-
Other supported values are ``ImageReadMode.RGB`` and ``ImageReadMode.RGB_ALPHA``.
397+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
398+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
399+
for available modes.
404400
405401
Returns:
406402
Decoded image (Tensor[image_channels, image_height, image_width])
@@ -426,9 +422,9 @@ def _decode_heic(input: torch.Tensor, mode: ImageReadMode = ImageReadMode.UNCHAN
426422
Args:
427423
input (Tensor[1]): a one dimensional contiguous uint8 tensor containing
428424
the raw bytes of the HEIC image.
429-
mode (str or ImageReadMode): The read mode used for optionally
430-
converting the image color space. Default: ``ImageReadMode.UNCHANGED``.
431-
Other supported values are ``ImageReadMode.RGB`` and ``ImageReadMode.RGB_ALPHA``.
425+
mode (str or ImageReadMode): The mode to convert the image to, e.g. "RGB".
426+
Default is "UNCHANGED". See :class:`~torchvision.io.ImageReadMode`
427+
for available modes.
432428
433429
Returns:
434430
Decoded image (Tensor[image_channels, image_height, image_width])

torchvision/io/video.py

+24
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,14 @@ def write_video(
6464
"""
6565
Writes a 4d tensor in [T, H, W, C] format in a video file
6666
67+
.. warning::
68+
69+
In the near future, we intend to centralize PyTorch's video decoding
70+
capabilities within the `torchcodec
71+
<https://github.com/pytorch/torchcodec>`_ project. We encourage you to
72+
try it out and share your feedback, as the torchvision video decoders
73+
will eventually be deprecated.
74+
6775
Args:
6876
filename (str): path where the video will be saved
6977
video_array (Tensor[T, H, W, C]): tensor containing the individual frames,
@@ -243,6 +251,14 @@ def read_video(
243251
"""
244252
Reads a video from a file, returning both the video frames and the audio frames
245253
254+
.. warning::
255+
256+
In the near future, we intend to centralize PyTorch's video decoding
257+
capabilities within the `torchcodec
258+
<https://github.com/pytorch/torchcodec>`_ project. We encourage you to
259+
try it out and share your feedback, as the torchvision video decoders
260+
will eventually be deprecated.
261+
246262
Args:
247263
filename (str): path to the video file. If using the pyav backend, this can be whatever ``av.open`` accepts.
248264
start_pts (int if pts_unit = 'pts', float / Fraction if pts_unit = 'sec', optional):
@@ -367,6 +383,14 @@ def read_video_timestamps(filename: str, pts_unit: str = "pts") -> Tuple[List[in
367383
"""
368384
List the video frames timestamps.
369385
386+
.. warning::
387+
388+
In the near future, we intend to centralize PyTorch's video decoding
389+
capabilities within the `torchcodec
390+
<https://github.com/pytorch/torchcodec>`_ project. We encourage you to
391+
try it out and share your feedback, as the torchvision video decoders
392+
will eventually be deprecated.
393+
370394
Note that the function decodes the whole video frame-by-frame.
371395
372396
Args:

torchvision/io/video_reader.py

+8
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,14 @@ class VideoReader:
5252
backends: video_reader, pyav, and cuda.
5353
Backends can be set via `torchvision.set_video_backend` function.
5454
55+
.. warning::
56+
57+
In the near future, we intend to centralize PyTorch's video decoding
58+
capabilities within the `torchcodec
59+
<https://github.com/pytorch/torchcodec>`_ project. We encourage you to
60+
try it out and share your feedback, as the torchvision video decoders
61+
will eventually be deprecated.
62+
5563
.. betastatus:: VideoReader class
5664
5765
Example:

0 commit comments

Comments
 (0)