geti_sdk.deployment
Introduction
The deployment package allows creating a deployment of any Intel® Geti™ project. A project deployment can run inference on an image or video frame locally, i.e. without any connection to the Intel® Geti™ server.
Deployments can be created for both single task and task chain projects alike, the API is the same in both cases.
Creating a deployment for a project is done through the
Geti
class, which provides a
convenience method deploy_project()
.
The following code snippet shows:
How to create a deployment for a project
How to use it to run local inference for an image
How to save the deployment to disk
import cv2
from geti_sdk import Geti
geti = Geti(
host="https://0.0.0.0", username="dummy_user", password="dummy_password"
)
# Download the model data and create a `Deployment`
deployment = geti.deploy_project(project_name="dummy_project")
# Load the inference models for all tasks in the project, for CPU inference
deployment.load_inference_models(device='CPU')
# Run inference
dummy_image = cv2.imread('dummy_image.png')
dummy_image = cv2.cvtColor(dummy_image, cv2.COLOR_BGR2RGB)
prediction = deployment.infer(image=dummy_image)
# Save the deployment to disk
deployment.save(path_to_folder="deployment_dummy_project")
A saved Deployment can be loaded from its containing folder using the
from_folder()
method, like so:
from geti_sdk.deployment import Deployment
local_deployment = Deployment.from_folder("deployment_dummy_project")
Module contents
- class geti_sdk.deployment.deployed_model.DeployedModel(name: str, precision: List[str], creation_date: str | datetime | None, latency: str | None = None, fps_throughput: float | None = None, purge_info: ModelPurgeInfo | None = None, size: int | None = None, target_device: str | None = None, target_device_type: str | None = None, previous_revision_id: str | None = None, previous_trained_revision_id: str | None = None, performance: Performance | None = None, id: str | None = None, label_schema_in_sync: bool | None = None, total_disk_size: int | None = None, training_framework: TrainingFramework | None = None, learning_approach: str | None = None, model_format: str | None = None, has_xai_head: bool = False, *, model_status: str | EnumType, optimization_methods: List[str], optimization_objectives: Dict[str, Any], optimization_type: str | EnumType, version: int | None = None, configurations: List[OptimizationConfigurationParameter] | None = None, hyper_parameters: TaskConfiguration | None = None)
Bases:
OptimizedModel
Representation of an Intel® Geti™ model that has been deployed for inference. It can be loaded onto a device to generate predictions.
- hyper_parameters: TaskConfiguration | None
- property model_data_path: str
Return the path to the raw model data
- Returns:
path to the directory containing the raw model data
- get_data(source: str | PathLike | GetiSession)
Load the model weights from a data source. The source can be one of the following:
The Intel® Geti™ platform (if an GetiSession instance is passed). In this
case the weights will be downloaded, and extracted to a temporary directory
A zip file on local disk, in this case the weights will be extracted to a temporary directory
A folder on local disk containing the .xml and .bin file for the model
- Parameters:
source – Data source to load the weights from
- load_inference_model(device: str = 'CPU', configuration: Dict[str, Any] | None = None, project: Project | None = None, plugin_configuration: Dict[str, str] | None = None, max_async_infer_requests: int = 0, task_index: int = 0) None
Load the actual model weights to a specified device.
- Parameters:
device – Device (CPU or GPU) to load the model to. Defaults to ‘CPU’
configuration – Optional dictionary holding additional configuration parameters for the model
project – Optional project to which the model belongs. This is only used when the model is run on OVMS, in that case the project is needed to identify the correct model
plugin_configuration – Configuration for the OpenVINO execution mode and plugins. This can include for example specific performance hints. For further details, refer to the OpenVINO documentation here: https://docs.openvino.ai/2022.3/openvino_docs_OV_UG_Performance_Hints.html#doxid-openvino-docs-o-v-u-g-performance-hints
max_async_infer_requests – Maximum number of asynchronous infer request that can be processed in parallel. This depends on the properties of the target device. If left to 0 (the default), the optimal number of requests will be selected automatically.
task_index – Index of the task within the project for which the model is trained.
- Returns:
OpenVino inference engine model that can be used to make predictions on images
- classmethod from_model_and_hypers(model: OptimizedModel, hyper_parameters: TaskConfiguration | None = None) DeployedModel
Create a DeployedModel instance out of an OptimizedModel and it’s corresponding set of hyper parameters.
- Parameters:
model – OptimizedModel to convert to a DeployedModel
hyper_parameters – TaskConfiguration instance containing the hyper parameters for the model
- Returns:
DeployedModel instance
- classmethod from_folder(path_to_folder: str | PathLike) DeployedModel
Create a DeployedModel instance from a folder containing the model data.
- Parameters:
path_to_folder – Path to the folder that holds the model data
- Returns:
DeployedModel instance
- save(path_to_folder: str | PathLike) bool
Save the DeployedModel instance to the designated folder.
- Parameters:
path_to_folder – Path to the folder to save the model to
- Returns:
True if the model was saved successfully, False otherwise
- infer(image: ndarray, explain: bool = False) Prediction
Run inference on an already preprocessed image.
- Parameters:
image – numpy array representing an image
explain – True to include saliency maps and feature maps in the returned Prediction. Note that these are only available if supported by the model.
- Returns:
Dictionary containing the model outputs
- infer_async(image: ndarray, explain: bool = False, runtime_data: Any | None = None) None
Perform asynchronous inference on the image.
NOTE: Inference results are not returned directly! Instead, a post-inference callback should be defined to handle results, using the .set_asynchronous_callback method.
- Parameters:
image – numpy array representing an image
explain – True to include saliency maps and feature maps in the returned Prediction. Note that these are only available if supported by the model.
runtime_data – An optional object containing any additional data. that should be passed to the asynchronous callback for each infer request. This can for example be a timestamp or filename for the image to infer. You can for instance pass a dictionary, or a tuple/list of objects.
- set_asynchronous_callback(callback_function: Callable[[Prediction, Any | None], None]) None
Set the callback function to handle asynchronous inference results. This function is called whenever a result for an asynchronous inference request comes available.
- Parameters:
callback_function –
Function that should be called to handle asynchronous inference results. The function should take the following input parameters:
The inference results (the Prediction). This is the primary input
Any additional data that will be passed with the infer request at runtime. For example, this could be a timestamp for the frame, or a title/filepath, etc. This can be in the form of any object: You can for instance pass a dictionary, or a tuple/list of multiple objects
- property labels: LabelList
Return the Labels for the model.
This requires the inference model to be loaded, getting this property while inference models are not loaded will raise a ValueError
- Returns:
LabelList containing the SDK labels for the model
- infer_queue_full() bool
Return True if the queue for asynchronous infer requests is full, False otherwise
- Returns:
True if the infer queue is full, False otherwise
- await_all() None
Block execution untill all asynchronous infer requests have finished processing.
This means that program execution will resume once the infer queue is empty
This is a flow control function, it is only applicable when using asynchronous inference.
- await_any() None
Block execution untill any of the asynchronous infer requests currently in the infer queue completes processing
This means that program execution will resume once a single spot becomes available in the infer queue
This is a flow control function, it is only applicable when using asynchronous inference.
- property asynchronous_mode
Return True if the DeployedModel is in asynchronous inference mode, False otherwise
- get_model_config() Dict[str, Any]
Return the model configuration as specified in the model.xml metadata file of the OpenVINO model
- Returns:
Dictionary containing the OpenVINO model configuration
- class geti_sdk.deployment.deployment.Deployment(project: Project, models: List[DeployedModel])
Bases:
object
Representation of a deployed Intel® Geti™ project that can be used to run inference locally
- models: List[DeployedModel]
- property is_single_task: bool
Return True if the deployment represents a project with only a single task.
- Returns:
True if the deployed project contains only one trainable task, False if it is a pipeline project
- property are_models_loaded: bool
Return True if all inference models for the Deployment are loaded and ready to infer.
- Returns:
True if all inference models for the deployed project are loaded in memory and ready for inference
- property asynchronous_mode: bool
Return True if the deployment is configured for asynchronous inference execution.
Asynchronous execution can result in a large increase in throughput for certain applications, for example video processing. However, it requires slightly more configuration compared to synchronous (the default) mode. For a more detailed overview of the differences between synchronous and asynchronous execution, please refer to the OpenVINO documentation at https://docs.openvino.ai/2024/notebooks/115-async-api-with-output.html
- Returns:
True if the deployment is set in asynchronous execution mode, False if it is in synchronous mode.
- save(path_to_folder: str | PathLike) bool
Save the Deployment instance to a folder on local disk.
- Parameters:
path_to_folder – Folder to save the deployment to
- Returns:
True if the deployment was saved successfully, False otherwise
- classmethod from_folder(path_to_folder: str | PathLike) Deployment
Create a Deployment instance from a specified path_to_folder.
- Parameters:
path_to_folder – Path to the folder containing the Deployment data
- Returns:
Deployment instance corresponding to the deployment data in the folder
- load_inference_models(device: str | Sequence[str] = 'CPU', max_async_infer_requests: int | Sequence[int] | None = None, openvino_configuration: Dict[str, str] | None = None)
Load the inference models for the deployment to the specified device.
Note: For a list of devices that are supported for OpenVINO inference, please see: https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html
- Parameters:
device –
Device to load the inference models to (e.g. ‘CPU’, ‘GPU’, ‘AUTO’, etc).
NOTE: For task chain deployments, it is possible to pass a list of device names instead. Each entry in the list is the target device for the model corresponding to it’s index. I.e. the first entry is applied for the first model, the second entry for the second model.
max_async_infer_requests –
Maximum number of infer requests to use in asynchronous mode. This parameter only takes effect when the asynchronous inference mode is used. It controls the maximum number of request that will be handled in parallel. When set to 0, OpenVINO will attempt to determine the optimal number of requests for your system automatically. When left as None (the default), a single infer request per model will be used to conserve memory
NOTE: For task chain deployments, it is possible to pass a list of integers. Each entry in the list is the maximum number of infer requests for the model corresponding to it’s index. I.e. the first number is applied for the first model, the second number for the second model.
openvino_configuration – Configuration for the OpenVINO execution mode and plugins. This can include for example specific performance hints. For further details, refer to the OpenVINO documentation here: https://docs.openvino.ai/2022.3/openvino_docs_OV_UG_Performance_Hints.html#doxid-openvino-docs-o-v-u-g-performance-hints
- infer(image: ndarray, name: str | None = None) Prediction
Run inference on an image for the full model chain in the deployment.
- Parameters:
image – Image to run inference on, as a numpy array containing the pixel data. The image is expected to have dimensions [height x width x channels], with the channels in RGB order
name – Optional name for the image, if specified this will be used in any post inference hooks belonging to the deployment.
- Returns:
inference results
- infer_async(image: ndarray, runtime_data: Any | None = None) None
Perform asynchronous inference on the image.
NOTE: Inference results are not returned directly! Instead, a post-inference callback should be defined to handle results, using the .set_asynchronous_callback method.
- Parameters:
image – numpy array representing an image
runtime_data – An optional object containing any additional data that should be passed to the asynchronous callback for each infer request. This can for example be a timestamp or filename for the image to infer. Passing complex objects like a tuple/list or dictionary is also supported.
- explain(image: ndarray, name: str | None = None) Prediction
Run inference on an image for the full model chain in the deployment. The resulting prediction will also contain saliency maps and the feature vector for the input image.
- Parameters:
image – Image to run inference on, as a numpy array containing the pixel data. The image is expected to have dimensions [height x width x channels], with the channels in RGB order
name – Optional name for the image, if specified this will be used in any post inference hooks belonging to the deployment.
- Returns:
inference results
- explain_async(image: ndarray, runtime_data: Any | None = None) None
Perform asynchronous inference on the image, and generate saliency maps and feature vectors
NOTE: Inference results are not returned directly! Instead, a post-inference callback should be defined to handle results, using the .set_asynchronous_callback method.
- Parameters:
image – numpy array representing an image
runtime_data – An optional object containing any additional data that should be passed to the asynchronous callback for each infer request. This can for example be a timestamp or filename for the image to infer. Passing complex objects like a tuple/list or dictionary is also supported.
- generate_ovms_config(output_folder: str | PathLike) None
Generate the configuration files needed to push the models for the Deployment instance to OVMS.
- Parameters:
output_folder – Target folder to save the configuration files to
- infer_queue_full() bool
Return True if the queue for asynchronous infer requests is full, False otherwise
- Returns:
True if the infer queue is full, False otherwise
- await_all() None
Block execution until all asynchronous infer requests have finished processing.
This means that program execution will resume once the infer queue is empty
This is a flow control function, it is only applicable when using asynchronous inference.
- await_any() None
Block execution until any of the asynchronous infer requests currently in the infer queue completes processing
This means that program execution will resume once a single spot becomes available in the infer queue
This is a flow control function, it is only applicable when using asynchronous inference.
- set_asynchronous_callback(callback_function: Callable[[ndarray, Prediction, Any | None], None] | None = None) None
Set the callback function to handle asynchronous inference results. This function is called whenever a result for an asynchronous inference request comes available.
- NOTE: Calling this method enables asynchronous inference mode for the
deployment. The regular synchronous inference method will no longer be available, unless the deployment is reloaded.
- Parameters:
callback_function –
Function that should be called to handle asynchronous inference results. The function should take the following input parameters:
The image/video frame. This is the original image to infer
The inference results (the Prediction). This is the model output for the image
Any additional data that will be passed with the infer request at runtime. For example, this could be a timestamp for the frame, or a title/filepath, etc. This can be in the form of any object: You can for instance pass a dictionary, or a tuple/list of multiple objects
- NOTE: It is possible to call this method without specifying any
callback function. In that case, the deployment will be switched to asynchronous mode but only the post-inference hooks will be executed after each infer request
- property post_inference_hooks: List[PostInferenceHookInterface]
Return the currently active post inference hooks for the deployment
- Returns:
list of PostInferenceHook objects
- clear_inference_hooks() None
Remove all post inference hooks for the deployment
- add_post_inference_hook(hook: PostInferenceHookInterface) None
Add a post inference hook, which will be executed after each call to Deployment.infer
- Parameters:
hook – PostInferenceHook to be added to the deployment