otx.algo.classification.backbones#
Backbone modules for OTX custom model.
Classes
|
EfficientNetBackbone class represents the backbone architecture of EfficientNet models. |
|
Timm backbone model. |
|
MobileNetV3Backbone class represents the backbone architecture of MobileNetV3. |
|
Implementation of Vision Transformer from Timm. |
|
TorchvisionBackbone is a class that represents a backbone model from the torchvision library. |
- class otx.algo.classification.backbones.EfficientNetBackbone(version: Literal['b0', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7', 'b8'], input_size: tuple[int, int] | None = None, pretrained: bool = True, **kwargs)[source]#
Bases:
object
EfficientNetBackbone class represents the backbone architecture of EfficientNet models.
- EFFICIENTNET_CFG#
A dictionary containing configuration parameters for different versions of EfficientNet.
- init_block_channels#
The number of channels in the initial block of the backbone.
- Type:
ClassVar[int]
- layers#
A list specifying the number of layers in each stage of the backbone.
- kernel_sizes_per_layers#
A list specifying the kernel size in each stage of the backbone.
- strides_per_stage#
A list specifying the stride in each stage of the backbone.
- final_block_channels#
The number of channels in the final block of the backbone.
- Type:
ClassVar[int]
Create a new instance of the EfficientNet class.
- Parameters:
version (EFFICIENTNET_VERSION) – The version of EfficientNet to use.
input_size (tuple[int, int] | None, optional) – The input size of the model. Defaults to None.
pretrained (bool, optional) – Whether to load pretrained weights. Defaults to True.
**kwargs – Additional keyword arguments to be passed to the EfficientNet constructor.
- Returns:
The created EfficientNet model instance.
- Return type:
EfficientNet
- class otx.algo.classification.backbones.MobileNetV3Backbone(mode: Literal['small', 'large'] = 'large', width_mult: float = 1.0, pretrained: bool = True, **kwargs)[source]#
Bases:
object
MobileNetV3Backbone class represents the backbone architecture of MobileNetV3.
- Parameters:
mode (Literal["small", "large"], optional) – The mode of the backbone architecture. Defaults to “large”.
width_mult (float, optional) – Width multiplier for the backbone architecture. Defaults to 1.0.
pretrained (bool, optional) – Whether to load pretrained weights. Defaults to True.
**kwargs – Additional keyword arguments to be passed to the MobileNetV3 model.
- Returns:
An instance of the MobileNetV3 model.
- Return type:
MobileNetV3
Examples
# Create a MobileNetV3Backbone instance backbone = MobileNetV3Backbone(mode=”small”, width_mult=0.75, pretrained=False)
# Create a MobileNetV3 model with the specified backbone model = MobileNetV3(backbone=backbone)
Create a new instance of the MobileNetV3 class.
- Parameters:
mode (Literal["small", "large"], optional) – The mode of the MobileNetV3 model. Defaults to “large”.
width_mult (float, optional) – Width multiplier for the MobileNetV3 model. Defaults to 1.0.
pretrained (bool, optional) – Whether to load pretrained weights for the MobileNetV3 model. Defaults to True.
**kwargs – Additional keyword arguments to be passed to the MobileNetV3 constructor.
- Returns:
A new instance of the MobileNetV3 class.
- Return type:
MobileNetV3
- class otx.algo.classification.backbones.TimmBackbone(model_name: str, pretrained: bool = False, **kwargs)[source]#
Bases:
Module
Timm backbone model.
- Parameters:
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algo.classification.backbones.TorchvisionBackbone(backbone: Literal['alexnet', 'convnext_base', 'convnext_large', 'convnext_small', 'convnext_tiny', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3', 'efficientnet_b4', 'efficientnet_b5', 'efficientnet_b6', 'efficientnet_b7', 'efficientnet_v2_l', 'efficientnet_v2_m', 'efficientnet_v2_s', 'googlenet', 'mobilenet_v3_large', 'mobilenet_v3_small', 'regnet_x_16gf', 'regnet_x_1_6gf', 'regnet_x_32gf', 'regnet_x_3_2gf', 'regnet_x_400mf', 'regnet_x_800mf', 'regnet_x_8gf', 'regnet_y_128gf', 'regnet_y_16gf', 'regnet_y_1_6gf', 'regnet_y_32gf', 'regnet_y_3_2gf', 'regnet_y_400mf', 'regnet_y_800mf', 'regnet_y_8gf', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext101_64x4d', 'resnext50_32x4d', 'swin_b', 'swin_s', 'swin_t', 'swin_v2_b', 'swin_v2_s', 'swin_v2_t', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'wide_resnet101_2', 'wide_resnet50_2'], pretrained: bool = False, **kwargs)[source]#
Bases:
Module
TorchvisionBackbone is a class that represents a backbone model from the torchvision library.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algo.classification.backbones.VisionTransformer(arch: ~typing.Literal['vit-t', 'vit-tiny', 'vit-s', 'vit-small', 'vit-b', 'vit-base', 'vit-l', 'vit-large', 'vit-h', 'vit-huge', 'dinov2-s', 'dinov2-small', 'dinov2-small-seg', 'dinov2-b', 'dinov2-base', 'dinov2-l', 'dinov2-large', 'dinov2-g', 'dinov2-giant'] | str = 'vit-base', img_size: int | tuple[int, int] = 224, patch_size: int | None = None, in_chans: int = 3, num_classes: int = 1000, embed_dim: int | None = None, depth: int | None = None, num_heads: int | None = None, mlp_ratio: float | None = None, qkv_bias: bool = True, qk_norm: bool = False, init_values: float | None = None, class_token: bool = True, no_embed_class: bool | None = None, reg_tokens: int | None = None, pre_norm: bool = False, dynamic_img_size: bool = False, dynamic_img_pad: bool = False, pos_drop_rate: float = 0.0, patch_drop_rate: float = 0.0, proj_drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, embed_layer: ~typing.Callable = <class 'timm.layers.patch_embed.PatchEmbed'>, block_fn: ~torch.nn.modules.module.Module = <class 'timm.models.vision_transformer.Block'>, mlp_layer: ~torch.nn.modules.module.Module | None = None, act_layer: str | ~typing.Callable | ~typing.Type[~torch.nn.modules.module.Module] | None = None, norm_layer: str | ~typing.Callable | ~typing.Type[~torch.nn.modules.module.Module] | None = None, interpolate_offset: float = 0.1, lora: bool = False)[source]#
Bases:
BaseModule
Implementation of Vision Transformer from Timm.
- A PyTorch impl ofAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Parameters:
arch – Vision Transformer architecture.
img_size – Input image size.
patch_size – Patch size.
in_chans – Number of image input channels.
num_classes – Mumber of classes for classification head.
embed_dim – Transformer embedding dimension.
depth – Depth of transformer.
num_heads – Number of attention heads.
mlp_ratio – Ratio of mlp hidden dim to embedding dim.
qkv_bias – Enable bias for qkv projections if True.
init_values – Layer-scale init values (layer-scale enabled if not None).
class_token – Use class token.
no_embed_class – Don’t include position embeddings for class (or reg) tokens.
reg_tokens – Number of register tokens.
drop_rate – Head dropout rate.
pos_drop_rate – Position embedding dropout rate.
attn_drop_rate – Attention dropout rate.
drop_path_rate – Stochastic depth rate.
weight_init – Weight initialization scheme.
fix_init – Apply weight initialization fix (scaling w/ layer index).
embed_layer – Patch embedding layer.
norm_layer – Normalization layer.
act_layer – MLP activation layer.
block_fn – Transformer block layer.
interpolate_offset – work-around offset to apply when interpolating positional embeddings
lora – Enable LoRA training.
Initialize BaseModule, inherited from torch.nn.Module.
- forward(x: Tensor, out_type: Literal['raw', 'cls_token', 'featmap', 'avg_featmap'] = 'cls_token') tuple [source]#
Forward pass of the VisionTransformer model.
- get_intermediate_layers(x: Tensor, n: int = 1, reshape: bool = False, return_class_token: bool = False, norm: bool = True) tuple [source]#
Get intermediate layers of the VisionTransformer.
- Parameters:
x (torch.Tensor) – Input tensor.
n (int) – Number of last blocks to take. If it’s a list, take the specified blocks.
reshape (bool) – Whether to reshape the output feature maps.
return_class_token (bool) – Whether to return the class token.
norm (bool) – Whether to apply normalization to the outputs.
- Returns:
A tuple containing the intermediate layer outputs.
- Return type:
- interpolate_pos_encoding(x: Tensor, w: int, h: int) Tensor [source]#
Interpolates the positional encoding to match the input dimensions.