HPO#

HPO package.

class otx.hpo.HyperBand(minimum_resource: Optional[Union[float, int]] = None, reduction_factor: int = 3, asynchronous_sha: bool = True, asynchronous_bracket: bool = False, **kwargs)#

It implements the Asyncronous HyperBand scheduler with iterations only.

Please refer the below papers for the detailed algorithm.

[1] “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization”, JMLR 2018

https://arxiv.org/abs/1603.06560 https://homes.cs.washington.edu/~jamieson/hyperband.html

[2] “A System for Massively Parallel Hyperparameter Tuning”, MLSys 2020

https://arxiv.org/abs/1810.05934

Args:

minimum_resource (Union[float, int]): Minimum resource to use for training a trial. Defaults to None. reduction_factor (int, optional): Decicdes how many trials to promote to next rung.

Only top 1 / reduction_factor of rung trials can be promoted. Defaults to 3.

asynchronous_sha (bool, optional): Whether to operate SHA asynchronously. Defaults to True. asynchronous_bracket (bool, optional): Whether SHAs(brackets) are running parallelly or not.

Defaults to True. Defaults to False.

auto_config() → List[Dict[str, Any]]#

Configure ASHA automatically aligning with possible resource.

Configure ASHA automatically. If resource is lesser than full ASHA, decrease ASHA scale. In contrast, resource is more than full ASHA, increase ASHA scale.

Returns:: List[Dict[str, Any]]: ASHA configuration. It’s used to make brackets.

get_best_config() → Optional[Dict[str, Any]]#

Get best configuration in ASHA.

Returns:: Optional[Dict[str, Any]]: Best configuration in ASHA. If there is no trial to select, return None.

get_next_sample() → Optional[AshaTrial]#

Get next trial to train.

Returns:: Optional[AshaTrial]: Next trial to train. If there is no trial to train, then return None.

get_progress() → Union[int, float]#: Get current progress of ASHA.

is_done() → bool#

Check that the ASHA is done.

Returns:: bool: Whether ASHA is done.

print_result()#: Print a ASHA result.

report_score(score: Union[float, int], resource: Union[float, int], trial_id: str, done: bool = False) → RUNNING: 1>]#

Report a score to ASHA.

Args:: score (Union[float, int]): Score to report. resource (Union[float, int]): Resource used to get score. trial_id (str): Trial id. done (bool, optional): Whether training trial is done. Defaults to False.
Returns:: Literal[TrialStatus.STOP, TrialStatus.RUNNING]: Decide whether to continue training or not.

save_results()#: Save a ASHA result.

class otx.hpo.TrialStatus(value)#

Enum class for trial status.

CUDAOOM = 3#

READY = 0#

RUNNING = 1#

STOP = 2#

UNKNOWN = -1#

otx.hpo.run_hpo_loop(hpo_algo: HpoBase, train_func: Callable, resource_type: Literal['gpu', 'cpu'] = 'gpu', num_parallel_trial: Optional[int] = None, num_gpu_for_single_trial: Optional[int] = None, available_gpu: Optional[str] = None)#

Run the HPO loop.

Args:

hpo_algo (HpoBase): HPO algorithms. train_func (Callable): Function to train a model. resource_type (Literal[‘gpu’, ‘cpu’], optional): Which type of resource to use.

If can be changed depending on environment. Defaults to “gpu”.

num_parallel_trial (Optional[int], optional): How many trials to run in parallel.: It’s used for CPUResourceManager. Defaults to None.
num_gpu_for_single_trial (Optional[int], optional): How many GPUs are used for a single trial.: It’s used for GPUResourceManager. Defaults to None.
available_gpu (Optional[str], optional): How many GPUs are available. It’s used for GPUResourceManager.: Defaults to None.