Detect#

Detect dataset format#

This command attempts to detect the format of a dataset in a directory. Currently, only local directories are supported.

The detection result may be one of:

  • a single format being detected.

  • no formats being detected (if the dataset doesn’t match any known format).

  • multiple formats being detected (if the dataset is ambiguous).

The command outputs this result in a human-readable form and optionally as a machine-readable JSON report (see --json-report).

The format of the machine-readable report is as follows:

{
    "detected_formats": [
        "detected-format-name-1", "detected-format-name-2", ...
    ],
    "rejected_formats": {
        "rejected-format-name-1": {
            "reason": <reason-code>,
            "message": "line 1\nline 2\n...\nline N"
        },
        "rejected-format-name-2": ...,
        ...
    }
}

The <reason-code> can be one of:

  • "detection_unsupported": the corresponding format does not support detection.

  • "insufficient_confidence": the dataset matched the corresponding format, but it matched at least one other format better.

  • "unmet_requirements": the dataset didn’t meet at least one requirement of the corresponding format.

Other reason codes may be defined in the future.

Usage:

datum detect [-h] [-p PROJECT_DIR] [--show-rejections]
             [--json-report JSON_REPORT] [--depth DEPTH] url

Parameters:

  • <url> - Path to the dataset to analyse.

  • -p, --project (string) - Directory of the project to operate on (default: current directory). The project might contain local plugins with custom formats, which will be used for detection.

  • --show-rejections - Describe why each supported format that wasn’t detected was rejected. This only affects the human-readable output; the machine-readable report always includes rejection information.

  • --json-report (string) - Path to which to save a JSON report describing detected and rejected formats. By default, no report is saved.

  • --depth (int) - The maximum depth for recursive search. (default: 2)

  • -h, --help - Print the help message and exit.

Examples:

  • Detect the format of a dataset in a given directory, showing rejection information

    datum detect --show-rejections <path/to/dataset/>