datumaro.components.algorithms.hash_key_inference.prune#

Functions

match_num_item_for_cluster(ratio, ...)

Classes

Centroid()

Select items through clustering with centers targeting the desired number.

ClusteredRandom()

Select items through clustering and choose randomly within each cluster.

Entropy()

Select items through clustering and choose them based on label entropy in each cluster.

NDRSelect()

Select items based on NDR among each subset.

Prune(dataset[, cluster_method, hash_type])

Prune make a representative and manageable subset.

PruneBase()

QueryClust()

Select items through clustering with inits that imply each label.

RandomSelect()

Select items randomly from the dataset.

datumaro.components.algorithms.hash_key_inference.prune.match_num_item_for_cluster(ratio, dataset_len, cluster_num_item_list)[source]#
class datumaro.components.algorithms.hash_key_inference.prune.PruneBase[source]#

Bases: ABC

abstract base(ratio: float, num_centers: int | None, labels: List[int] | None, database_keys: ndarray | None, item_list: List[DatasetItem], source: Dataset | None) Tuple[List[DatasetItem], Dict | None][source]#

It executes each method for pruning.

Parameters:
  • ratio – How much to remain dataset after pruning.

  • num_centers – Number of centers for clustering.

  • labels – Label of one annotation for each datasetitem.

  • database_keys – Batch of the numpy formatted hash_key.

  • item_list – List of datasetitem of dataset.

  • source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.RandomSelect[source]#

Bases: PruneBase

Select items randomly from the dataset.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:
  • ratio – How much to remain dataset after pruning.

  • num_centers – Number of centers for clustering.

  • labels – Label of one annotation for each datasetitem.

  • database_keys – Batch of the numpy formatted hash_key.

  • item_list – List of datasetitem of dataset.

  • source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Centroid[source]#

Bases: PruneBase

Select items through clustering with centers targeting the desired number.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:
  • ratio – How much to remain dataset after pruning.

  • num_centers – Number of centers for clustering.

  • labels – Label of one annotation for each datasetitem.

  • database_keys – Batch of the numpy formatted hash_key.

  • item_list – List of datasetitem of dataset.

  • source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.ClusteredRandom[source]#

Bases: PruneBase

Select items through clustering and choose randomly within each cluster.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:
  • ratio – How much to remain dataset after pruning.

  • num_centers – Number of centers for clustering.

  • labels – Label of one annotation for each datasetitem.

  • database_keys – Batch of the numpy formatted hash_key.

  • item_list – List of datasetitem of dataset.

  • source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.QueryClust[source]#

Bases: PruneBase

Select items through clustering with inits that imply each label.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:
  • ratio – How much to remain dataset after pruning.

  • num_centers – Number of centers for clustering.

  • labels – Label of one annotation for each datasetitem.

  • database_keys – Batch of the numpy formatted hash_key.

  • item_list – List of datasetitem of dataset.

  • source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Entropy[source]#

Bases: PruneBase

Select items through clustering and choose them based on label entropy in each cluster.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:
  • ratio – How much to remain dataset after pruning.

  • num_centers – Number of centers for clustering.

  • labels – Label of one annotation for each datasetitem.

  • database_keys – Batch of the numpy formatted hash_key.

  • item_list – List of datasetitem of dataset.

  • source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.NDRSelect[source]#

Bases: PruneBase

Select items based on NDR among each subset.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:
  • ratio – How much to remain dataset after pruning.

  • num_centers – Number of centers for clustering.

  • labels – Label of one annotation for each datasetitem.

  • database_keys – Batch of the numpy formatted hash_key.

  • item_list – List of datasetitem of dataset.

  • source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Prune(dataset: Dataset, cluster_method: str = 'random', hash_type: str = 'img')[source]#

Bases: HashInference

Prune make a representative and manageable subset.

get_pruned(ratio: float = 0.5) Dataset[source]#