torchvision.io¶

The torchvision.io package provides functions for performing IO operations. They are currently specific to reading and writing video.

Video¶

torchvision.io.read_video(filename, start_pts=0, end_pts=None, pts_unit='pts')[source]¶

Reads a video from a file, returning both the video frames as well as the audio frames

Parameters

filename (str) – path to the video file
start_pts (int if pts_unit = 'pts', optional) – float / Fraction if pts_unit = ‘sec’, optional the start presentation time of the video
end_pts (int if pts_unit = 'pts', optional) – float / Fraction if pts_unit = ‘sec’, optional the end presentation time
pts_unit (str, optional) – unit in which start_pts and end_pts values will be interpreted, either ‘pts’ or ‘sec’. Defaults to ‘pts’.

Returns

vframes (Tensor[T, H, W, C]) – the T video frames
aframes (Tensor[K, L]) – the audio frames, where K is the number of channels and L is the number of points
info (Dict) – metadata for the video and audio. Can contain the fields video_fps (float) and audio_fps (int)

torchvision.io.read_video_timestamps(filename, pts_unit='pts')[source]¶

List the video frames timestamps.

Note that the function decodes the whole video frame-by-frame.

Parameters

filename (str) – path to the video file
pts_unit (str, optional) – unit in which timestamp values will be returned either ‘pts’ or ‘sec’. Defaults to ‘pts’.

Returns

pts (List[int] if pts_unit = ‘pts’) – List[Fraction] if pts_unit = ‘sec’ presentation timestamps for each one of the frames in the video.
video_fps (int) – the frame rate for the video

torchvision.io.write_video(filename, video_array, fps, video_codec='libx264', options=None)[source]¶

Writes a 4d tensor in [T, H, W, C] format in a video file

Parameters

filename (str) – path where the video will be saved
video_array (Tensor[T, H, W, C]) – tensor containing the individual frames, as a uint8 tensor in [T, H, W, C] format
fps (Number) – frames per second