otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit#

TinyViT for MobileSAM.

Functions

build_tiny_vit([img_size, drop_path_rate])

Build TinyViT backbone.

Classes

`Attention`(dim, key_dim[, num_heads, ...])	Attention block for TinyViT.
`BasicLayer`(dim, input_resolution, int], ...)	A basic TinyViT layer for one stage.
`Conv2d_BN`(a, b[, ks, stride, pad, dilation, ...])	Conv2d_BN for TinyViT.
`ConvLayer`(dim, input_resolution, depth, ...)	ConvLayer for TinyViT.
`DropPath`([drop_prob])	DropPath for TinyViT.
`LayerNorm2d`(num_channels[, eps])	2D-Layer Normalize for TinyViT.
`MBConv`(in_chans, out_chans, expand_ratio, ...)	MBConv for TinyViT.
`Mlp`(in_features, hidden_features, ...)	MLP for TinyViT.
`PatchEmbed`(in_chans, embed_dim, resolution, ...)	PatchEmbed for TinyViT.
`PatchMerging`(input_resolution, dim, out_dim, ...)	PatchMerging for TinyViT.
`TinyViT`([img_size, in_chans, num_classes, ...])	TinyViT for MobileSAM.
`TinyViTBlock`(dim, input_resolution, int], ...)	TinyViT Block.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Attention(dim: int, key_dim: int, num_heads: int = 8, attn_ratio: int = 4, resolution: Tuple[int, int] = (14, 14))[source]#

Bases: Module

Attention block for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#: Forward call.

train(mode: bool = True) → None[source]#

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns:: self
Return type:: Module

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.BasicLayer(dim: int, input_resolution: ~typing.Tuple[int, int], depth: int, num_heads: int, window_size: int, mlp_ratio: float = 4.0, drop: float = 0.0, drop_path: ~typing.List[float] | float = 0.0, downsample: ~torch.nn.modules.module.Module | None = None, use_checkpoint: bool = False, local_conv_size: int = 3, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>, out_dim: int | None = None)[source]#

Bases: Module

A basic TinyViT layer for one stage.

Parameters:

dim (int) – Number of input channels.
input_resolution (tuple[int]) – Input resolution.
depth (int) – Number of blocks.
num_heads (int) – Number of attention heads.
window_size (int) – Local window size.
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.
drop (float, optional) – Dropout rate. Default: 0.0
drop_path (float | tuple[float], optional) – Stochastic depth rate. Default: 0.0
downsample (nn.Module | None, optional) – Downsample layer at the end of the layer. Default: None
use_checkpoint (bool) – Whether to use checkpointing to save memory. Default: False.
local_conv_size – the kernel size of the depthwise convolution between attention and MLP. Default: 3
activation – the activation function. Default: nn.GELU
out_dim – the output dimension of the layer. Default: dim

Initializes internal Module state, shared by both nn.Module and ScriptModule.

extra_repr() → str[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: Tensor) → Tensor[source]#: Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Conv2d_BN(a: int, b: int, ks: int = 1, stride: int = 1, pad: int = 0, dilation: int = 1, groups: int = 1, bn_weight_init: float = 1.0)[source]#

Bases: Sequential

Conv2d_BN for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

fuse() → Module[source]#: Fuse weights and biases.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.ConvLayer(dim: int, input_resolution: int, depth: int, activation: Module, drop_path: List[float] | float = 0.0, downsample: Module | None = None, use_checkpoint: bool = False, out_dim: int | None = None, conv_expand_ratio: float = 4.0)[source]#

Bases: Module

ConvLayer for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#: Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.DropPath(drop_prob: List[float] | float | None = None)[source]#

Bases: DropPath

DropPath for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.LayerNorm2d(num_channels: int, eps: float = 1e-06)[source]#

Bases: Module

2D-Layer Normalize for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#: Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.MBConv(in_chans: int, out_chans: int, expand_ratio: float, activation: Module, drop_path: List[float] | float)[source]#

Bases: Module

MBConv for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#: Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Mlp(in_features: int, hidden_features: int | None = None, out_features: int | None = None, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>, drop: float = 0.0)[source]#

Bases: Module

MLP for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#: Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.PatchEmbed(in_chans: int, embed_dim: int, resolution: int, activation: Module)[source]#

Bases: Module

PatchEmbed for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#: Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.PatchMerging(input_resolution: Tuple[int, int], dim: int, out_dim: int, activation: Module)[source]#

Bases: Module

PatchMerging for TinyViT.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#: Forward call.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.TinyViT(img_size: int = 224, in_chans: int = 3, num_classes: int = 1000, embed_dims: List[int] = [96, 192, 384, 768], depths: List[int] = [2, 2, 6, 2], num_heads: List[int] = [3, 6, 12, 24], window_sizes: List[int] = [7, 7, 14, 7], mlp_ratio: float = 4.0, drop_rate: float = 0.0, drop_path_rate: float = 0.1, use_checkpoint: bool = False, mbconv_expand_ratio: float = 4.0, local_conv_size: int = 3, layer_lr_decay: float = 1.0)[source]#

Bases: Module

TinyViT for MobileSAM.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]#

Forward call.

Parameters:: x (Tensor) – Input image tensor with shape (B, C, H, W).
Returns:: Output tensor with shape (B, H’, W’, C’).
Return type:: Tensor

forward_features(x: Tensor) → Tensor[source]#

Forward call.

Parameters:: x (Tensor) – Input image tensor with shape (B, C, H, W).
Returns:: Output tensor with shape (B, H’, W’, C’).
Return type:: Tensor

no_weight_decay_keywords() → Set[str][source]#: Keyworkds for no weight decay.

set_layer_lr_decay(layer_lr_decay: float) → None[source]#: Set layer lr decay.

class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.TinyViTBlock(dim: int, input_resolution: ~typing.Tuple[int, int], num_heads: int, window_size: int = 7, mlp_ratio: float = 4.0, drop: float = 0.0, drop_path: ~typing.List[float] | float = 0.0, local_conv_size: int = 3, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>)[source]#

Bases: Module

TinyViT Block.

Parameters:

dim (int) – Number of input channels.
input_resolution (tuple[int, int]) – Input resolution.
num_heads (int) – Number of attention heads.
window_size (int) – Window size.
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.
drop (float, optional) – Dropout rate. Default: 0.0
drop_path (float, optional) – Stochastic depth rate. Default: 0.0
local_conv_size (int) – the kernel size of the convolution between Attention and MLP. Default: 3
activation – the activation function. Default: nn.GELU

Initializes internal Module state, shared by both nn.Module and ScriptModule.

extra_repr() → str[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: Tensor) → Tensor[source]#: Forward call.

otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.build_tiny_vit(img_size: int = 1024, drop_path_rate: float = 0.0)[source]#

Build TinyViT backbone.

Parameters:

img_size (int) – Input image size.
drop_path_rate (float) – Drop path rate for stochastic depth.

Returns:

TinyViT backbone.

Return type:

TinyViT