datumaro.components.algorithms.hash_key_inference.prune#

Functions

match_num_item_for_cluster(ratio, ...)

Classes

`Centroid`()	Select items through clustering with centers targeting the desired number.
`ClusteredRandom`()	Select items through clustering and choose randomly within each cluster.
`Entropy`()	Select items through clustering and choose them based on label entropy in each cluster.
`NDRSelect`()	Select items based on NDR among each subset.
`Prune`(dataset[, cluster_method, hash_type])	Prune make a representative and manageable subset.
`PruneBase`()
`QueryClust`()	Select items through clustering with inits that imply each label.
`RandomSelect`()	Select items randomly from the dataset.

datumaro.components.algorithms.hash_key_inference.prune.match_num_item_for_cluster(ratio, dataset_len, cluster_num_item_list)[source]#

class datumaro.components.algorithms.hash_key_inference.prune.PruneBase[source]#

Bases: ABC

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.RandomSelect[source]#

Bases: PruneBase

Select items randomly from the dataset.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Centroid[source]#

Bases: PruneBase

Select items through clustering with centers targeting the desired number.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.ClusteredRandom[source]#

Bases: PruneBase

Select items through clustering and choose randomly within each cluster.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.QueryClust[source]#

Bases: PruneBase

Select items through clustering with inits that imply each label.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Entropy[source]#

Bases: PruneBase

Select items through clustering and choose them based on label entropy in each cluster.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.NDRSelect[source]#

Bases: PruneBase

Select items based on NDR among each subset.

base(ratio, num_centers, labels, database_keys, item_list, source)[source]#

It executes each method for pruning.

Parameters:

ratio – How much to remain dataset after pruning.
num_centers – Number of centers for clustering.
labels – Label of one annotation for each datasetitem.
database_keys – Batch of the numpy formatted hash_key.
item_list – List of datasetitem of dataset.
source – Whole dataset.

Returns:

It returns a tuple of selected items and distance of each item and clusters.

class datumaro.components.algorithms.hash_key_inference.prune.Prune(dataset: Dataset, cluster_method: str = 'random', hash_type: str = 'img')[source]#

Bases: HashInference

Prune make a representative and manageable subset.

get_pruned(ratio: float = 0.5) → Dataset[source]#