geti_sdk.deployment

Introduction

The deployment package allows creating a deployment of any Intel® Geti™ project. A project deployment can run inference on an image or video frame locally, i.e. without any connection to the Intel® Geti™ server.

Deployments can be created for both single task and task chain projects alike, the API is the same in both cases.

Creating a deployment for a project is done through the Geti class, which provides a convenience method deploy_project().

The following code snippet shows:

  1. How to create a deployment for a project

  2. How to use it to run local inference for an image

  3. How to save the deployment to disk

import cv2

from geti_sdk import Geti

geti = Geti(
  host="https://0.0.0.0", username="dummy_user", password="dummy_password"
)

# Download the model data and create a `Deployment`
deployment = geti.deploy_project(project_name="dummy_project")

# Load the inference models for all tasks in the project, for CPU inference
deployment.load_inference_models(device='CPU')

# Run inference
dummy_image = cv2.imread('dummy_image.png')
dummy_image = cv2.cvtColor(dummy_image, cv2.COLOR_BGR2RGB)
prediction = deployment.infer(image=dummy_image)

# Save the deployment to disk
deployment.save(path_to_folder="deployment_dummy_project")

A saved Deployment can be loaded from its containing folder using the from_folder() method, like so:

from geti_sdk.deployment import Deployment

local_deployment = Deployment.from_folder("deployment_dummy_project")

Module contents

class geti_sdk.deployment.deployed_model.DeployedModel(name: str, precision: List[str], creation_date: str | datetime | None, latency: str | None = None, fps_throughput: float | None = None, purge_info: ModelPurgeInfo | None = None, size: int | None = None, target_device: str | None = None, target_device_type: str | None = None, previous_revision_id: str | None = None, previous_trained_revision_id: str | None = None, performance: Performance | None = None, id: str | None = None, label_schema_in_sync: bool | None = None, total_disk_size: int | None = None, training_framework: TrainingFramework | None = None, learning_approach: str | None = None, model_format: str | None = None, has_xai_head: bool = False, *, model_status: str | EnumType, optimization_methods: List[str], optimization_objectives: Dict[str, Any], optimization_type: str | EnumType, version: int | None = None, configurations: List[OptimizationConfigurationParameter] | None = None, hyper_parameters: TaskConfiguration | None = None)

Bases: OptimizedModel

Representation of an Intel® Geti™ model that has been deployed for inference. It can be loaded onto a device to generate predictions.

hyper_parameters: TaskConfiguration | None
property model_data_path: str

Return the path to the raw model data

Returns:

path to the directory containing the raw model data

get_data(source: str | PathLike | GetiSession)

Load the model weights from a data source. The source can be one of the following:

  1. The Intel® Geti™ platform (if an GetiSession instance is passed). In this

case the weights will be downloaded, and extracted to a temporary directory

  1. A zip file on local disk, in this case the weights will be extracted to a temporary directory

  2. A folder on local disk containing the .xml and .bin file for the model

Parameters:

source – Data source to load the weights from

load_inference_model(device: str = 'CPU', configuration: Dict[str, Any] | None = None, project: Project | None = None, plugin_configuration: Dict[str, str] | None = None, max_async_infer_requests: int = 0, task_index: int = 0) None

Load the actual model weights to a specified device.

Parameters:
  • device – Device (CPU or GPU) to load the model to. Defaults to ‘CPU’

  • configuration – Optional dictionary holding additional configuration parameters for the model

  • project – Optional project to which the model belongs. This is only used when the model is run on OVMS, in that case the project is needed to identify the correct model

  • plugin_configuration – Configuration for the OpenVINO execution mode and plugins. This can include for example specific performance hints. For further details, refer to the OpenVINO documentation here: https://docs.openvino.ai/2022.3/openvino_docs_OV_UG_Performance_Hints.html#doxid-openvino-docs-o-v-u-g-performance-hints

  • max_async_infer_requests – Maximum number of asynchronous infer request that can be processed in parallel. This depends on the properties of the target device. If left to 0 (the default), the optimal number of requests will be selected automatically.

  • task_index – Index of the task within the project for which the model is trained.

Returns:

OpenVino inference engine model that can be used to make predictions on images

classmethod from_model_and_hypers(model: OptimizedModel, hyper_parameters: TaskConfiguration | None = None) DeployedModel

Create a DeployedModel instance out of an OptimizedModel and it’s corresponding set of hyper parameters.

Parameters:
  • model – OptimizedModel to convert to a DeployedModel

  • hyper_parameters – TaskConfiguration instance containing the hyper parameters for the model

Returns:

DeployedModel instance

classmethod from_folder(path_to_folder: str | PathLike) DeployedModel

Create a DeployedModel instance from a folder containing the model data.

Parameters:

path_to_folder – Path to the folder that holds the model data

Returns:

DeployedModel instance

save(path_to_folder: str | PathLike) bool

Save the DeployedModel instance to the designated folder.

Parameters:

path_to_folder – Path to the folder to save the model to

Returns:

True if the model was saved successfully, False otherwise

infer(image: ndarray, explain: bool = False) Prediction

Run inference on an already preprocessed image.

Parameters:
  • image – numpy array representing an image

  • explain – True to include saliency maps and feature maps in the returned Prediction. Note that these are only available if supported by the model.

Returns:

Dictionary containing the model outputs

infer_async(image: ndarray, explain: bool = False, runtime_data: Any | None = None) None

Perform asynchronous inference on the image.

NOTE: Inference results are not returned directly! Instead, a post-inference callback should be defined to handle results, using the .set_asynchronous_callback method.

Parameters:
  • image – numpy array representing an image

  • explain – True to include saliency maps and feature maps in the returned Prediction. Note that these are only available if supported by the model.

  • runtime_data – An optional object containing any additional data. that should be passed to the asynchronous callback for each infer request. This can for example be a timestamp or filename for the image to infer. You can for instance pass a dictionary, or a tuple/list of objects.

set_asynchronous_callback(callback_function: Callable[[Prediction, Any | None], None]) None

Set the callback function to handle asynchronous inference results. This function is called whenever a result for an asynchronous inference request comes available.

Parameters:

callback_function

Function that should be called to handle asynchronous inference results. The function should take the following input parameters:

  1. The inference results (the Prediction). This is the primary input

  2. Any additional data that will be passed with the infer request at runtime. For example, this could be a timestamp for the frame, or a title/filepath, etc. This can be in the form of any object: You can for instance pass a dictionary, or a tuple/list of multiple objects

property labels: LabelList

Return the Labels for the model.

This requires the inference model to be loaded, getting this property while inference models are not loaded will raise a ValueError

Returns:

LabelList containing the SDK labels for the model

infer_queue_full() bool

Return True if the queue for asynchronous infer requests is full, False otherwise

Returns:

True if the infer queue is full, False otherwise

await_all() None

Block execution untill all asynchronous infer requests have finished processing.

This means that program execution will resume once the infer queue is empty

This is a flow control function, it is only applicable when using asynchronous inference.

await_any() None

Block execution untill any of the asynchronous infer requests currently in the infer queue completes processing

This means that program execution will resume once a single spot becomes available in the infer queue

This is a flow control function, it is only applicable when using asynchronous inference.

property asynchronous_mode

Return True if the DeployedModel is in asynchronous inference mode, False otherwise

get_model_config() Dict[str, Any]

Return the model configuration as specified in the model.xml metadata file of the OpenVINO model

Returns:

Dictionary containing the OpenVINO model configuration

class geti_sdk.deployment.deployment.Deployment(project: Project, models: List[DeployedModel])

Bases: object

Representation of a deployed Intel® Geti™ project that can be used to run inference locally

project: Project
models: List[DeployedModel]
property is_single_task: bool

Return True if the deployment represents a project with only a single task.

Returns:

True if the deployed project contains only one trainable task, False if it is a pipeline project

property are_models_loaded: bool

Return True if all inference models for the Deployment are loaded and ready to infer.

Returns:

True if all inference models for the deployed project are loaded in memory and ready for inference

property asynchronous_mode: bool

Return True if the deployment is configured for asynchronous inference execution.

Asynchronous execution can result in a large increase in throughput for certain applications, for example video processing. However, it requires slightly more configuration compared to synchronous (the default) mode. For a more detailed overview of the differences between synchronous and asynchronous execution, please refer to the OpenVINO documentation at https://docs.openvino.ai/2024/notebooks/115-async-api-with-output.html

Returns:

True if the deployment is set in asynchronous execution mode, False if it is in synchronous mode.

save(path_to_folder: str | PathLike) bool

Save the Deployment instance to a folder on local disk.

Parameters:

path_to_folder – Folder to save the deployment to

Returns:

True if the deployment was saved successfully, False otherwise

classmethod from_folder(path_to_folder: str | PathLike) Deployment

Create a Deployment instance from a specified path_to_folder.

Parameters:

path_to_folder – Path to the folder containing the Deployment data

Returns:

Deployment instance corresponding to the deployment data in the folder

load_inference_models(device: str | Sequence[str] = 'CPU', max_async_infer_requests: int | Sequence[int] | None = None, openvino_configuration: Dict[str, str] | None = None)

Load the inference models for the deployment to the specified device.

Note: For a list of devices that are supported for OpenVINO inference, please see: https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html

Parameters:
  • device

    Device to load the inference models to (e.g. ‘CPU’, ‘GPU’, ‘AUTO’, etc).

    NOTE: For task chain deployments, it is possible to pass a list of device names instead. Each entry in the list is the target device for the model corresponding to it’s index. I.e. the first entry is applied for the first model, the second entry for the second model.

  • max_async_infer_requests

    Maximum number of infer requests to use in asynchronous mode. This parameter only takes effect when the asynchronous inference mode is used. It controls the maximum number of request that will be handled in parallel. When set to 0, OpenVINO will attempt to determine the optimal number of requests for your system automatically. When left as None (the default), a single infer request per model will be used to conserve memory

    NOTE: For task chain deployments, it is possible to pass a list of integers. Each entry in the list is the maximum number of infer requests for the model corresponding to it’s index. I.e. the first number is applied for the first model, the second number for the second model.

  • openvino_configuration – Configuration for the OpenVINO execution mode and plugins. This can include for example specific performance hints. For further details, refer to the OpenVINO documentation here: https://docs.openvino.ai/2022.3/openvino_docs_OV_UG_Performance_Hints.html#doxid-openvino-docs-o-v-u-g-performance-hints

infer(image: ndarray, name: str | None = None) Prediction

Run inference on an image for the full model chain in the deployment.

Parameters:
  • image – Image to run inference on, as a numpy array containing the pixel data. The image is expected to have dimensions [height x width x channels], with the channels in RGB order

  • name – Optional name for the image, if specified this will be used in any post inference hooks belonging to the deployment.

Returns:

inference results

infer_async(image: ndarray, runtime_data: Any | None = None) None

Perform asynchronous inference on the image.

NOTE: Inference results are not returned directly! Instead, a post-inference callback should be defined to handle results, using the .set_asynchronous_callback method.

Parameters:
  • image – numpy array representing an image

  • runtime_data – An optional object containing any additional data that should be passed to the asynchronous callback for each infer request. This can for example be a timestamp or filename for the image to infer. Passing complex objects like a tuple/list or dictionary is also supported.

explain(image: ndarray, name: str | None = None) Prediction

Run inference on an image for the full model chain in the deployment. The resulting prediction will also contain saliency maps and the feature vector for the input image.

Parameters:
  • image – Image to run inference on, as a numpy array containing the pixel data. The image is expected to have dimensions [height x width x channels], with the channels in RGB order

  • name – Optional name for the image, if specified this will be used in any post inference hooks belonging to the deployment.

Returns:

inference results

explain_async(image: ndarray, runtime_data: Any | None = None) None

Perform asynchronous inference on the image, and generate saliency maps and feature vectors

NOTE: Inference results are not returned directly! Instead, a post-inference callback should be defined to handle results, using the .set_asynchronous_callback method.

Parameters:
  • image – numpy array representing an image

  • runtime_data – An optional object containing any additional data that should be passed to the asynchronous callback for each infer request. This can for example be a timestamp or filename for the image to infer. Passing complex objects like a tuple/list or dictionary is also supported.

generate_ovms_config(output_folder: str | PathLike) None

Generate the configuration files needed to push the models for the Deployment instance to OVMS.

Parameters:

output_folder – Target folder to save the configuration files to

infer_queue_full() bool

Return True if the queue for asynchronous infer requests is full, False otherwise

Returns:

True if the infer queue is full, False otherwise

await_all() None

Block execution until all asynchronous infer requests have finished processing.

This means that program execution will resume once the infer queue is empty

This is a flow control function, it is only applicable when using asynchronous inference.

await_any() None

Block execution until any of the asynchronous infer requests currently in the infer queue completes processing

This means that program execution will resume once a single spot becomes available in the infer queue

This is a flow control function, it is only applicable when using asynchronous inference.

set_asynchronous_callback(callback_function: Callable[[ndarray, Prediction, Any | None], None] | None = None) None

Set the callback function to handle asynchronous inference results. This function is called whenever a result for an asynchronous inference request comes available.

NOTE: Calling this method enables asynchronous inference mode for the

deployment. The regular synchronous inference method will no longer be available, unless the deployment is reloaded.

Parameters:

callback_function

Function that should be called to handle asynchronous inference results. The function should take the following input parameters:

  1. The image/video frame. This is the original image to infer

  2. The inference results (the Prediction). This is the model output for the image

  1. Any additional data that will be passed with the infer request at runtime. For example, this could be a timestamp for the frame, or a title/filepath, etc. This can be in the form of any object: You can for instance pass a dictionary, or a tuple/list of multiple objects

NOTE: It is possible to call this method without specifying any

callback function. In that case, the deployment will be switched to asynchronous mode but only the post-inference hooks will be executed after each infer request

property post_inference_hooks: List[PostInferenceHookInterface]

Return the currently active post inference hooks for the deployment

Returns:

list of PostInferenceHook objects

clear_inference_hooks() None

Remove all post inference hooks for the deployment

add_post_inference_hook(hook: PostInferenceHookInterface) None

Add a post inference hook, which will be executed after each call to Deployment.infer

Parameters:

hook – PostInferenceHook to be added to the deployment

Subpackages