otx.hpo#

HPO package.

Functions

run_hpo_loop(hpo_algo, train_func[, ...])

Run the HPO loop.

Classes

TrialStatus(value)

Enum class for trial status.

HyperBand(search_space[, save_path, mode, ...])

It implements the Asyncronous HyperBand scheduler with iterations only.

class otx.hpo.HyperBand(search_space: dict[str, dict[str, Any]], save_path: str | None = None, mode: Literal['max', 'min'] = 'max', num_trials: int | None = None, num_workers: int = 1, num_full_iterations: int | float = 1, full_dataset_size: int = 0, expected_time_ratio: int | float | None = None, maximum_resource: int | float | None = None, resume: bool = False, prior_hyper_parameters: dict | list[dict] | None = None, acceptable_additional_time_ratio: float | int = 1.0, minimum_resource: int | float | None = None, reduction_factor: int = 3, asynchronous_sha: bool = True, asynchronous_bracket: bool = False)[source]#

Bases: HpoBase

It implements the Asyncronous HyperBand scheduler with iterations only.

Please refer the below papers for the detailed algorithm.

[1] “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization”, JMLR 2018

https://arxiv.org/abs/1603.06560 https://homes.cs.washington.edu/~jamieson/hyperband.html

[2] “A System for Massively Parallel Hyperparameter Tuning”, MLSys 2020

https://arxiv.org/abs/1810.05934

Parameters:
  • search_space (dict[str, dict[str, Any]]) – hyper parameter search space to find.

  • save_path (str | None, optional) – path where result of HPO is saved.

  • mode ("max" | "min", optional) – One of {min, max}. Determines whether objective is minimizing or maximizing the score.

  • num_trials (int | None, optional) – How many training to conduct for HPO.

  • num_workers (int, optional) – How many trains are executed in parallel.

  • num_full_iterations (int, optional) – epoch for traninig after HPO.

  • full_dataset_size (int, optional) – train dataset size

  • expected_time_ratio (int | float | None, optional) – Time to use for HPO. If HPO is configured automatically, HPO use time about exepected_time_ratio * train time after HPO times.

  • maximum_resource (int | float | None, optional) – Maximum resource to use for training each trial.

  • resume (bool, optional) – resume flag decide to use previous HPO results. If HPO completed, you can just use optimized hyper parameters. If HPO stopped in middle, you can resume in middle.

  • prior_hyper_parameters (dict | list[dict] | None, optional) –

  • acceptable_additional_time_ratio (float | int, optional) –

  • minimum_resource (float | int | None, optional) – Minimum resource to use for training a trial. Defaults to None.

  • reduction_factor (int, optional) – Decicdes how many trials to promote to next rung. Only top 1 / reduction_factor of rung trials can be promoted. Defaults to 3.

  • asynchronous_sha (bool, optional) – Whether to operate SHA asynchronously. Defaults to True.

  • asynchronous_bracket (bool, optional) – Whether SHAs(brackets) are running parallelly or not. Defaults to True. Defaults to False.

auto_config() list[dict[str, Any]][source]#

Configure ASHA automatically aligning with possible resource.

Configure ASHA automatically. If resource is lesser than full ASHA, decrease ASHA scale. In contrast, resource is more than full ASHA, increase ASHA scale.

Returns:

ASHA configuration. It’s used to make brackets.

Return type:

list[dict[str, Any]]

get_best_config() dict[str, Any] | None[source]#

Get best configuration in ASHA.

Returns:

Best configuration in ASHA. If there is no trial to select, return None.

Return type:

dict[str, Any] | None

get_next_sample() AshaTrial | None[source]#

Get next trial to train.

Returns:

Next trial to train. If there is no trial to train, then return None.

Return type:

AshaTrial | None

get_progress() int | float[source]#

Get current progress of ASHA.

is_done() bool[source]#

Check that the ASHA is done.

Returns:

Whether ASHA is done.

Return type:

bool

print_result() None[source]#

Print a ASHA result.

report_score(score: float | int, resource: float | int, trial_id: Hashable, done: bool = False) TrialStatus[source]#

Report a score to ASHA.

Parameters:
  • score (float | int) – Score to report.

  • resource (float | int) – Resource used to get score.

  • trial_id (str) – Trial id.

  • done (bool, optional) – Whether training trial is done. Defaults to False.

Returns:

Decide whether to continue training or not.

Return type:

Literal[TrialStatus.STOP, TrialStatus.RUNNING]

save_results() None[source]#

Save a ASHA result.

class otx.hpo.TrialStatus(value)[source]#

Bases: IntEnum

Enum class for trial status.

otx.hpo.run_hpo_loop(hpo_algo: HpoBase, train_func: Callable, resource_type: Literal[DeviceType.cpu, DeviceType.gpu, DeviceType.xpu] = DeviceType.gpu, num_parallel_trial: int | None = None, num_devices_per_trial: int = 1) None[source]#

Run the HPO loop.

Parameters:
  • hpo_algo (HpoBase) – HPO algorithms.

  • train_func (Callable) – Function to train a model.

  • resource_type (DeviceType.cpu | DeviceType.gpu | DeviceType.gpu, optional) – Which type of resource to use. If can be changed depending on environment. Defaults to DeviceType.gpu.

  • num_parallel_trial (int | None, optional) – How many trials to run in parallel. It’s used for CPUResourceManager. Defaults to None.

  • num_devices_per_trial (int, optional) – How many GPUs are used for a single trial. It’s used for GPUResourceManager. Defaults to 1.