otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit#

TinyViT for MobileSAM.

Functions

build_tiny_vit([img_size, drop_path_rate])

Build TinyViT backbone.

Classes

Attention(dim, key_dim[, num_heads, ...])

Attention block for TinyViT.

BasicLayer(dim, input_resolution, int], ...)

A basic TinyViT layer for one stage.

Conv2d_BN(a, b[, ks, stride, pad, dilation, ...])

Conv2d_BN for TinyViT.

ConvLayer(dim, input_resolution, depth, ...)

ConvLayer for TinyViT.

DropPath([drop_prob])

DropPath for TinyViT.

LayerNorm2d(num_channels[, eps])

2D-Layer Normalize for TinyViT.

MBConv(in_chans, out_chans, expand_ratio, ...)

MBConv for TinyViT.

Mlp(in_features, hidden_features, ...)

MLP for TinyViT.

PatchEmbed(in_chans, embed_dim, resolution, ...)

PatchEmbed for TinyViT.

PatchMerging(input_resolution, dim, out_dim, ...)

PatchMerging for TinyViT.

TinyViT([img_size, in_chans, num_classes, ...])

TinyViT for MobileSAM.

TinyViTBlock(dim, input_resolution, int], ...)

TinyViT Block.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Attention(dim: int, key_dim: int, num_heads: int = 8, attn_ratio: int = 4, resolution: Tuple[int, int] = (14, 14))[source]#

Bases: Module

Attention block for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

train(mode: bool = True) None[source]#

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns:

self

Return type:

Module

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.BasicLayer(dim: int, input_resolution: ~typing.Tuple[int, int], depth: int, num_heads: int, window_size: int, mlp_ratio: float = 4.0, drop: float = 0.0, drop_path: ~typing.List[float] | float = 0.0, downsample: ~torch.nn.modules.module.Module | None = None, use_checkpoint: bool = False, local_conv_size: int = 3, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>, out_dim: int | None = None)[source]#

Bases: Module

A basic TinyViT layer for one stage.

Parameters:
  • dim (int) – Number of input channels.

  • input_resolution (tuple[int]) – Input resolution.

  • depth (int) – Number of blocks.

  • num_heads (int) – Number of attention heads.

  • window_size (int) – Local window size.

  • mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.

  • drop (float, optional) – Dropout rate. Default: 0.0

  • drop_path (float | tuple[float], optional) – Stochastic depth rate. Default: 0.0

  • downsample (nn.Module | None, optional) – Downsample layer at the end of the layer. Default: None

  • use_checkpoint (bool) – Whether to use checkpointing to save memory. Default: False.

  • local_conv_size – the kernel size of the depthwise convolution between attention and MLP. Default: 3

  • activation – the activation function. Default: nn.GELU

  • out_dim – the output dimension of the layer. Default: dim

Initializes internal Module state, shared by both nn.Module and ScriptModule.

extra_repr() str[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: Tensor) Tensor[source]#

Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Conv2d_BN(a: int, b: int, ks: int = 1, stride: int = 1, pad: int = 0, dilation: int = 1, groups: int = 1, bn_weight_init: float = 1.0)[source]#

Bases: Sequential

Conv2d_BN for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

fuse() Module[source]#

Fuse weights and biases.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.ConvLayer(dim: int, input_resolution: int, depth: int, activation: Module, drop_path: List[float] | float = 0.0, downsample: Module | None = None, use_checkpoint: bool = False, out_dim: int | None = None, conv_expand_ratio: float = 4.0)[source]#

Bases: Module

ConvLayer for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.DropPath(drop_prob: List[float] | float | None = None)[source]#

Bases: DropPath

DropPath for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.LayerNorm2d(num_channels: int, eps: float = 1e-06)[source]#

Bases: Module

2D-Layer Normalize for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.MBConv(in_chans: int, out_chans: int, expand_ratio: float, activation: Module, drop_path: List[float] | float)[source]#

Bases: Module

MBConv for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Mlp(in_features: int, hidden_features: int | None = None, out_features: int | None = None, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>, drop: float = 0.0)[source]#

Bases: Module

MLP for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.PatchEmbed(in_chans: int, embed_dim: int, resolution: int, activation: Module)[source]#

Bases: Module

PatchEmbed for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.PatchMerging(input_resolution: Tuple[int, int], dim: int, out_dim: int, activation: Module)[source]#

Bases: Module

PatchMerging for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.TinyViT(img_size: int = 224, in_chans: int = 3, num_classes: int = 1000, embed_dims: List[int] = [96, 192, 384, 768], depths: List[int] = [2, 2, 6, 2], num_heads: List[int] = [3, 6, 12, 24], window_sizes: List[int] = [7, 7, 14, 7], mlp_ratio: float = 4.0, drop_rate: float = 0.0, drop_path_rate: float = 0.1, use_checkpoint: bool = False, mbconv_expand_ratio: float = 4.0, local_conv_size: int = 3, layer_lr_decay: float = 1.0)[source]#

Bases: Module

TinyViT for MobileSAM.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]#

Forward call.

Parameters:

x (Tensor) – Input image tensor with shape (B, C, H, W).

Returns:

Output tensor with shape (B, H’, W’, C’).

Return type:

Tensor

forward_features(x: Tensor) Tensor[source]#

Forward call.

Parameters:

x (Tensor) – Input image tensor with shape (B, C, H, W).

Returns:

Output tensor with shape (B, H’, W’, C’).

Return type:

Tensor

no_weight_decay_keywords() Set[str][source]#

Keyworkds for no weight decay.

set_layer_lr_decay(layer_lr_decay: float) None[source]#

Set layer lr decay.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.TinyViTBlock(dim: int, input_resolution: ~typing.Tuple[int, int], num_heads: int, window_size: int = 7, mlp_ratio: float = 4.0, drop: float = 0.0, drop_path: ~typing.List[float] | float = 0.0, local_conv_size: int = 3, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>)[source]#

Bases: Module

TinyViT Block.

Parameters:
  • dim (int) – Number of input channels.

  • input_resolution (tuple[int, int]) – Input resolution.

  • num_heads (int) – Number of attention heads.

  • window_size (int) – Window size.

  • mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.

  • drop (float, optional) – Dropout rate. Default: 0.0

  • drop_path (float, optional) – Stochastic depth rate. Default: 0.0

  • local_conv_size (int) – the kernel size of the convolution between Attention and MLP. Default: 3

  • activation – the activation function. Default: nn.GELU

Initializes internal Module state, shared by both nn.Module and ScriptModule.

extra_repr() str[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: Tensor) Tensor[source]#

Forward call.

otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.build_tiny_vit(img_size: int = 1024, drop_path_rate: float = 0.0)[source]#

Build TinyViT backbone.

Parameters:
  • img_size (int) – Input image size.

  • drop_path_rate (float) – Drop path rate for stochastic depth.

Returns:

TinyViT backbone.

Return type:

TinyViT