torchvision.models¶

The models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation and person keypoint detection.

Classification¶

The models subpackage contains definitions for the following model architectures for image classification:

You can construct a model with random weights by calling its constructor:

import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
densenet = models.densenet161()
inception = models.inception_v3()
googlenet = models.googlenet()
shufflenet = models.shufflenet_v2_x1_0()
mobilenet = models.mobilenet_v2()
resnext50_32x4d = models.resnext50_32x4d()

We provide pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True:

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)
shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet = models.mobilenet_v2(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)

Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.

Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

An example of such normalization can be found in the imagenet example here

ImageNet 1-crop error rates (224x224)

Network	Top-1 error	Top-5 error
AlexNet	43.45	20.91
VGG-11	30.98	11.37
VGG-13	30.07	10.75
VGG-16	28.41	9.62
VGG-19	27.62	9.12
VGG-11 with batch normalization	29.62	10.19
VGG-13 with batch normalization	28.45	9.63
VGG-16 with batch normalization	26.63	8.50
VGG-19 with batch normalization	25.76	8.15
ResNet-18	30.24	10.92
ResNet-34	26.70	8.58
ResNet-50	23.85	7.13
ResNet-101	22.63	6.44
ResNet-152	21.69	5.94
SqueezeNet 1.0	41.90	19.58
SqueezeNet 1.1	41.81	19.38
Densenet-121	25.35	7.83
Densenet-169	24.00	7.00
Densenet-201	22.80	6.43
Densenet-161	22.35	6.20
Inception v3	22.55	6.44
GoogleNet	30.22	10.47
ShuffleNet V2	30.64	11.68
MobileNet V2	28.12	9.71
ResNeXt-50-32x4d	22.38	6.30
ResNeXt-101-32x8d	20.69	5.47

Alexnet¶

torchvision.models.alexnet(pretrained=False, progress=True, **kwargs)[source]¶

AlexNet model architecture from the “One weird trick…” paper.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

VGG¶

torchvision.models.vgg11(pretrained=False, progress=True, **kwargs)[source]¶

VGG 11-layer model (configuration “A”)

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg11_bn(pretrained=False, progress=True, **kwargs)[source]¶

VGG 11-layer model (configuration “A”) with batch normalization

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg13(pretrained=False, progress=True, **kwargs)[source]¶

VGG 13-layer model (configuration “B”)

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg13_bn(pretrained=False, progress=True, **kwargs)[source]¶

VGG 13-layer model (configuration “B”) with batch normalization

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg16(pretrained=False, progress=True, **kwargs)[source]¶

VGG 16-layer model (configuration “D”)

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg16_bn(pretrained=False, progress=True, **kwargs)[source]¶

VGG 16-layer model (configuration “D”) with batch normalization

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg19(pretrained=False, progress=True, **kwargs)[source]¶

VGG 19-layer model (configuration “E”)

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg19_bn(pretrained=False, progress=True, **kwargs)[source]¶

VGG 19-layer model (configuration ‘E’) with batch normalization

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

ResNet¶

torchvision.models.resnet18(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ResNet-18 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnet34(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ResNet-34 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnet50(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ResNet-50 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnet101(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ResNet-101 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnet152(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ResNet-152 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

SqueezeNet¶

torchvision.models.squeezenet1_0(pretrained=False, progress=True, **kwargs)[source]¶

SqueezeNet model architecture from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.squeezenet1_1(pretrained=False, progress=True, **kwargs)[source]¶

SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

DenseNet¶

torchvision.models.densenet121(pretrained=False, progress=True, **kwargs)[source]¶

Densenet-121 model from “Densely Connected Convolutional Networks”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.densenet169(pretrained=False, progress=True, **kwargs)[source]¶

Densenet-169 model from “Densely Connected Convolutional Networks”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.densenet161(pretrained=False, progress=True, **kwargs)[source]¶

Densenet-161 model from “Densely Connected Convolutional Networks”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.densenet201(pretrained=False, progress=True, **kwargs)[source]¶

Densenet-201 model from “Densely Connected Convolutional Networks”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

Inception v3¶

torchvision.models.inception_v3(pretrained=False, progress=True, **kwargs)[source]¶

Inception v3 model architecture from “Rethinking the Inception Architecture for Computer Vision”.

Note

Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 299 x 299, so ensure your images are sized accordingly.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr
aux_logits (bool) – If True, add an auxiliary branch that can improve training. Default: True
transform_input (bool) – If True, preprocesses the input according to the method with which it was trained on ImageNet. Default: False

GoogLeNet¶

torchvision.models.googlenet(pretrained=False, progress=True, **kwargs)[source]¶

GoogLeNet (Inception v1) model architecture from “Going Deeper with Convolutions”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr
aux_logits (bool) – If True, adds two auxiliary branches that can improve training. Default: False when pretrained is True otherwise True
transform_input (bool) – If True, preprocesses the input according to the method with which it was trained on ImageNet. Default: False

ShuffleNet v2¶

torchvision.models.shufflenet_v2_x0_5(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ShuffleNetV2 with 0.5x output channels, as described in “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.shufflenet_v2_x1_0(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ShuffleNetV2 with 1.0x output channels, as described in “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.shufflenet_v2_x1_5(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ShuffleNetV2 with 1.5x output channels, as described in “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.shufflenet_v2_x2_0(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ShuffleNetV2 with 2.0x output channels, as described in “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

MobileNet v2¶

torchvision.models.mobilenet_v2(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a MobileNetV2 architecture from “MobileNetV2: Inverted Residuals and Linear Bottlenecks”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

ResNext¶

torchvision.models.resnext50_32x4d(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ResNeXt-50 32x4d model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnext101_32x8d(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a ResNeXt-101 32x8d model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

Semantic Segmentation¶

The models subpackage contains definitions for the following model architectures for semantic segmentation:

As with image classification models, all pre-trained models expect input images normalized in the same way. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. They have been trained on images resized such that their minimum size is 520.

The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset. You can see more information on how the subset has been selected in references/segmentation/coco_utils.py. The classes that the pre-trained model outputs are the following, in order:

['__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',
 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',
 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

The accuracies of the pre-trained models evaluated on COCO val2017 are as follows

Network	mean IoU	global pixelwise acc
FCN ResNet101	63.7	91.9
DeepLabV3 ResNet101	67.4	92.4

Fully Convolutional Networks¶

torchvision.models.segmentation.fcn_resnet50(pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)[source]¶

Constructs a Fully-Convolutional Network model with a ResNet-50 backbone.

Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017 which contains the same classes as Pascal VOC
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.segmentation.fcn_resnet101(pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)[source]¶

Constructs a Fully-Convolutional Network model with a ResNet-101 backbone.

Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017 which contains the same classes as Pascal VOC
progress (bool) – If True, displays a progress bar of the download to stderr

DeepLabV3¶

torchvision.models.segmentation.deeplabv3_resnet50(pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)[source]¶

Constructs a DeepLabV3 model with a ResNet-50 backbone.

Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017 which contains the same classes as Pascal VOC
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.segmentation.deeplabv3_resnet101(pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)[source]¶

Constructs a DeepLabV3 model with a ResNet-101 backbone.

Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017 which contains the same classes as Pascal VOC
progress (bool) – If True, displays a progress bar of the download to stderr

Object Detection, Instance Segmentation and Person Keypoint Detection¶

The models subpackage contains definitions for the following model architectures for detection:

The pre-trained models for detection, instance segmentation and keypoint detection are initialized with the classification models in torchvision.

The models expect a list of Tensor[C, H, W], in the range 0-1. The models internally resize the images so that they have a minimum size of 800. This option can be changed by passing the option min_size to the constructor of the models.

For object detection and instance segmentation, the pre-trained models return the predictions of the following classes:

COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

Here are the summary of the accuracies for the models trained on the instances set of COCO train2017 and evaluated on COCO val2017.

Network	box AP	mask AP	keypoint AP
Faster R-CNN ResNet-50 FPN	37.0
Mask R-CNN ResNet-50 FPN	37.9	34.6

For person keypoint detection, the accuracies for the pre-trained models are as follows

Network	box AP	mask AP	keypoint AP
Keypoint R-CNN ResNet-50 FPN	54.6		65.0

For person keypoint detection, the pre-trained model return the keypoints in the following order:

COCO_PERSON_KEYPOINT_NAMES = [
    'nose',
    'left_eye',
    'right_eye',
    'left_ear',
    'right_ear',
    'left_shoulder',
    'right_shoulder',
    'left_elbow',
    'right_elbow',
    'left_wrist',
    'right_wrist',
    'left_hip',
    'right_hip',
    'left_knee',
    'right_knee',
    'left_ankle',
    'right_ankle'
]

Runtime characteristics¶

The implementations of the models for object detection, instance segmentation and keypoint detection are efficient.

In the following table, we use 8 V100 GPUs, with CUDA 10.0 and CUDNN 7.4 to report the results. During training, we use a batch size of 2 per GPU, and during testing a batch size of 1 is used.

For test time, we report the time for the model evaluation and postprocessing (including mask pasting in image), but not the time for computing the precision-recall.

Network	train time (s / it)	test time (s / it)	memory (GB)
Faster R-CNN ResNet-50 FPN	0.2288	0.0590	5.2
Mask R-CNN ResNet-50 FPN	0.2728	0.0903	5.4
Keypoint R-CNN ResNet-50 FPN	0.3789	0.1242	6.8

Faster R-CNN¶

torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, **kwargs)[source]¶

Constructs a Faster R-CNN model with a ResNet-50-FPN backbone.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets dictionary, containing:

boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W

labels (Tensor[N]): the class label for each ground-truth box

The model returns a Dict[Tensor] during training, containing the classification and regression losses for both the RPN and the R-CNN.

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:

boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W

labels (Tensor[N]): the predicted labels for each image

scores (Tensor[N]): the scores or each prediction

Example:

>>> model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
>>> model.eval()
>>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
>>> predictions = model(x)

Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017
progress (bool) – If True, displays a progress bar of the download to stderr

Mask R-CNN¶

torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, **kwargs)[source]¶

Constructs a Mask R-CNN model with a ResNet-50-FPN backbone.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets dictionary, containing:

boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W

labels (Tensor[N]): the class label for each ground-truth box

masks (Tensor[N, H, W]): the segmentation binary masks for each instance

The model returns a Dict[Tensor] during training, containing the classification and regression losses for both the RPN and the R-CNN, and the mask loss.

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:

boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W

labels (Tensor[N]): the predicted labels for each image

scores (Tensor[N]): the scores or each prediction

masks (Tensor[N, H, W]): the predicted masks for each instance, in 0-1 range. In order to obtain the final segmentation masks, the soft masks can be thresholded, generally with a value of 0.5 (mask >= 0.5)

Example:

>>> model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
>>> model.eval()
>>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
>>> predictions = model(x)

Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017
progress (bool) – If True, displays a progress bar of the download to stderr

Keypoint R-CNN¶

torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=2, num_keypoints=17, pretrained_backbone=True, **kwargs)[source]¶

Constructs a Keypoint R-CNN model with a ResNet-50-FPN backbone.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets dictionary, containing:

boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W

labels (Tensor[N]): the class label for each ground-truth box

keypoints (Tensor[N, K, 3]): the K keypoints location for each of the N instances, in the format [x, y, visibility], where visibility=0 means that the keypoint is not visible.

The model returns a Dict[Tensor] during training, containing the classification and regression losses for both the RPN and the R-CNN, and the keypoint loss.

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:

boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W

labels (Tensor[N]): the predicted labels for each image

scores (Tensor[N]): the scores or each prediction

keypoints (Tensor[N, K, 3]): the locations of the predicted keypoints, in [x, y, v] format.

Example:

>>> model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=True)
>>> model.eval()
>>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
>>> predictions = model(x)

Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017
progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models¶

Classification¶

Alexnet¶

VGG¶

ResNet¶

SqueezeNet¶

DenseNet¶

Inception v3¶

GoogLeNet¶

ShuffleNet v2¶

MobileNet v2¶

ResNext¶

Semantic Segmentation¶

Fully Convolutional Networks¶

DeepLabV3¶

Object Detection, Instance Segmentation and Person Keypoint Detection¶

Runtime characteristics¶

Faster R-CNN¶

Mask R-CNN¶

Keypoint R-CNN¶

Docs

Tutorials

Resources