Refine ###### We here provide the examples of dataset validation, correction, query-based filtration and pruning. Datumaro's validator detects 22 anomalies such as missing or undefined label, far-from-mean outliers and generates the validation report by categorizing anomalies into `info`, `warning`, and `error`. Datumaro further offers the correction functionality from this validation report. Correct API automatically refines `errors` and `warnings`. Especially, filter API allows you to filter a dataset to satisfy some conditions. Here, XML `XPath `_ is used as a query format. For instance, with a given XML file below, we can filter a dataset by the subset name through ``/item[subset="minival2014"]``, by the media id through ``/item[id="290768"]``, by the image sizes through ``/item[image/width=image/height]``, and annotation information such as id (``id``), type (``type``), label (``label_id``), bounding box (``x, y, w, h``), etc. Through Prune API, you can create representative subsets of the entire dataset using various supported methods. .. code-block:: 290768 minival2014 612 612 3 80154 bbox 39 264.59 150.25 11.19 42.31 473.87 669839 bbox 41 163.58 191.75 76.98 73.63 5668.77 ... For the annotation-based filtration, we need to set the argument ``filter_annotations`` to ``True``. We provide the argument ``remove_empty`` to remove all media with an empty annotation. We note that datasets are updated in-place by default. .. toctree:: :maxdepth: 1 :hidden: notebooks/11_validate notebooks/12_correct_dataset notebooks/04_filter notebooks/17_data_pruning .. grid:: 1 2 2 2 :gutter: 2 .. grid-item-card:: .. button-ref:: notebooks/11_validate :color: primary :outline: :expand: .. grid-item-card:: .. button-ref:: notebooks/12_correct_dataset :color: primary :outline: :expand: .. grid-item-card:: .. button-ref:: notebooks/04_filter :color: primary :outline: :expand: .. grid-item-card:: .. button-ref:: notebooks/17_data_pruning :color: primary :outline: :expand: