Datasets.Dataset_utils
Utilities for downloading and managing datasets.
Return the platform-specific cache directory path for the given dataset.
The default location is "~/.cache/ocannl/datasets/dataset_name
/".
Parameters
Returns
Download a file from a URL to a destination path.
Creates parent directories as needed, downloads the file from url
, and saves it to dest_path
.
Parameters
Raises
Failure
on download or write error.Ensure a file exists at the given path, downloading if necessary.
Checks if dest_path
exists. If not, downloads the file from url
.
Parameters
Raises
Failure
on download or write error.val ensure_extracted_archive :
url:string ->
archive_path:string ->
extract_dir:string ->
check_file:string ->
unit
Ensure an archive is downloaded, extracted, and a file exists.
Checks if check_file
(relative to extract_dir
) exists. If not, downloads the archive from url
to archive_path
, extracts it into extract_dir
, and verifies check_file
is present. Currently supports only .tar.gz archives.
Parameters
extract_dir
to verify extraction.Raises
Failure
on download, extraction, or missing check_file
.Ensure a gzip-compressed file is decompressed to a target path.
If target_path
exists, does nothing and returns true
. Otherwise, if gz_path
exists, decompresses it to target_path
.
Parameters
Returns
true
if target_path
exists after the operation.false
if gz_path
does not exist.Raises
Failure
on gzip decompression error.Parse a CSV cell as a float.
Attempts to convert value
to a float. On failure, raises Failure
with a descriptive message including context ()
.
Parameters
Returns
Raises
Failure
if value
cannot be parsed as a float.Parse a CSV cell as an integer.
Attempts to convert value
to an int. On failure, raises Failure
with a descriptive message including context ()
.
Parameters
Returns
Raises
Failure
if value
cannot be parsed as an int.Recursively create a directory and its parents.
Creates the directory at path
, along with any missing parent directories. If path
already exists as a directory, does nothing.
Parameters
Raises
Unix.Unix_error
if creation fails for other reasons.