otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit#
TinyViT for MobileSAM.
Functions
|
Build TinyViT backbone. |
Classes
|
Attention block for TinyViT. |
|
A basic TinyViT layer for one stage. |
|
Conv2d_BN for TinyViT. |
|
ConvLayer for TinyViT. |
|
DropPath for TinyViT. |
|
2D-Layer Normalize for TinyViT. |
|
MBConv for TinyViT. |
|
MLP for TinyViT. |
|
PatchEmbed for TinyViT. |
|
PatchMerging for TinyViT. |
|
TinyViT for MobileSAM. |
|
TinyViT Block. |
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Attention(dim: int, key_dim: int, num_heads: int = 8, attn_ratio: int = 4, resolution: Tuple[int, int] = (14, 14))[source]#
Bases:
Module
Attention block for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- train(mode: bool = True) None [source]#
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.BasicLayer(dim: int, input_resolution: ~typing.Tuple[int, int], depth: int, num_heads: int, window_size: int, mlp_ratio: float = 4.0, drop: float = 0.0, drop_path: ~typing.List[float] | float = 0.0, downsample: ~torch.nn.modules.module.Module | None = None, use_checkpoint: bool = False, local_conv_size: int = 3, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>, out_dim: int | None = None)[source]#
Bases:
Module
A basic TinyViT layer for one stage.
- Parameters:
dim (int) – Number of input channels.
depth (int) – Number of blocks.
num_heads (int) – Number of attention heads.
window_size (int) – Local window size.
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.
drop (float, optional) – Dropout rate. Default: 0.0
drop_path (float | tuple[float], optional) – Stochastic depth rate. Default: 0.0
downsample (nn.Module | None, optional) – Downsample layer at the end of the layer. Default: None
use_checkpoint (bool) – Whether to use checkpointing to save memory. Default: False.
local_conv_size – the kernel size of the depthwise convolution between attention and MLP. Default: 3
activation – the activation function. Default: nn.GELU
out_dim – the output dimension of the layer. Default: dim
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Conv2d_BN(a: int, b: int, ks: int = 1, stride: int = 1, pad: int = 0, dilation: int = 1, groups: int = 1, bn_weight_init: float = 1.0)[source]#
Bases:
Sequential
Conv2d_BN for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.ConvLayer(dim: int, input_resolution: int, depth: int, activation: Module, drop_path: List[float] | float = 0.0, downsample: Module | None = None, use_checkpoint: bool = False, out_dim: int | None = None, conv_expand_ratio: float = 4.0)[source]#
Bases:
Module
ConvLayer for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.DropPath(drop_prob: List[float] | float | None = None)[source]#
Bases:
DropPath
DropPath for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.LayerNorm2d(num_channels: int, eps: float = 1e-06)[source]#
Bases:
Module
2D-Layer Normalize for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.MBConv(in_chans: int, out_chans: int, expand_ratio: float, activation: Module, drop_path: List[float] | float)[source]#
Bases:
Module
MBConv for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.Mlp(in_features: int, hidden_features: int | None = None, out_features: int | None = None, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>, drop: float = 0.0)[source]#
Bases:
Module
MLP for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.PatchEmbed(in_chans: int, embed_dim: int, resolution: int, activation: Module)[source]#
Bases:
Module
PatchEmbed for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.PatchMerging(input_resolution: Tuple[int, int], dim: int, out_dim: int, activation: Module)[source]#
Bases:
Module
PatchMerging for TinyViT.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.TinyViT(img_size: int = 224, in_chans: int = 3, num_classes: int = 1000, embed_dims: List[int] = [96, 192, 384, 768], depths: List[int] = [2, 2, 6, 2], num_heads: List[int] = [3, 6, 12, 24], window_sizes: List[int] = [7, 7, 14, 7], mlp_ratio: float = 4.0, drop_rate: float = 0.0, drop_path_rate: float = 0.1, use_checkpoint: bool = False, mbconv_expand_ratio: float = 4.0, local_conv_size: int = 3, layer_lr_decay: float = 1.0)[source]#
Bases:
Module
TinyViT for MobileSAM.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: Tensor) Tensor [source]#
Forward call.
- Parameters:
x (Tensor) – Input image tensor with shape (B, C, H, W).
- Returns:
Output tensor with shape (B, H’, W’, C’).
- Return type:
Tensor
- class otx.algorithms.visual_prompting.adapters.pytorch_lightning.models.backbones.tiny_vit.TinyViTBlock(dim: int, input_resolution: ~typing.Tuple[int, int], num_heads: int, window_size: int = 7, mlp_ratio: float = 4.0, drop: float = 0.0, drop_path: ~typing.List[float] | float = 0.0, local_conv_size: int = 3, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.GELU'>)[source]#
Bases:
Module
TinyViT Block.
- Parameters:
dim (int) – Number of input channels.
num_heads (int) – Number of attention heads.
window_size (int) – Window size.
mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim.
drop (float, optional) – Dropout rate. Default: 0.0
drop_path (float, optional) – Stochastic depth rate. Default: 0.0
local_conv_size (int) – the kernel size of the convolution between Attention and MLP. Default: 3
activation – the activation function. Default: nn.GELU
Initializes internal Module state, shared by both nn.Module and ScriptModule.