Datasets

TagmeClientAdvanced.create_dataset(name: str, access: str = 'organization', data_classification_level: str | None = None, data_source: str | None = None, organization_id: str | None = None, acl: List[str] | None = None, )Dataset

Create new dataset.

Parameters

* name – name of new dataset.

* data_classification_level – data classification level (‘K2’, ‘K3’).

* data_source – data source.

* organization_id – optional organization identifier. Defaults to None.

* access – access level.

* acl – access control list.

Returns

Information about created dataset.

TagmeClientAdvanced.download_dataset_data(dataset_id: str, destination_dir: str | Path, organization_id: str | None = None, ) → int

Download files from dataset.

Parameters

* dataset_id – id of dataset to download files from.

* destination_dir – path to directory where files will be downloaded.

* organization_id – optional organization identifier. Defaults to None.

Returns

number of files downloaded successfully.

TagmeClientAdvanced.download_dataset_file(file_id: str, organization_id: str | None = None, ) → bytes

TagmeClientAdvanced.get_dataset(dataset_id: str, organization_id: str | None = None, )Dataset

Get information about dataset.

Parameters

* dataset_id – dataset id to get information about.

* organization_id – optional organization identifier. Defaults to None.

Returns

information about dataset.

TagmeClientAdvanced.get_dataset_file(file_id: str, organization_id: str | None = None, )DatasetFile

Get dataset file by file_id (its id, name, url).

Parameters

* file_id – dataset file id to get.

* organization_id – optional organization identifier. Defaults to None.

Returns

file from dataset.

TagmeClientAdvanced.get_dataset_files(dataset_id: str, organization_id: str | None = None, ) → List[DatasetFile]

Get files from dataset (their id, name, url).

Parameters

* dataset_id – dataset id to get files from.

* organization_id – optional organization identifier. Defaults to None.

Returns

list of files from dataset.

TagmeClientAdvanced.get_datasets(page: int | None = None, size: int | None = None, query: str | None = None, organization_id: str | None = None, ) → List[Dataset]

Get organization’s datasets.

Parameters

* page – page number for pagination. Defaults to None.

* size – page size for pagination. Defaults to None.

* query – text query for filtering datasets.

* organization_id – optional organization identifier. Defaults to None.

Returns

list of information about datasets.

TagmeClientAdvanced.remove_dataset(dataset_id: str, organization_id: str | None = None, ) → None

TagmeClientAdvanced.replace_dataset_file(dataset_id: str, file_id: str, filepath: str | Path, organization_id: str | None = None, )UploadFilesResult

Replace file under existing uid and link.

Parameters

* dataset_id – id of dataset to upload file to.

* file_id – id of file to replace.

* filepath – path to new file.

* organization_id – optional organization identifier. Defaults to None.

Returns

object which contains information about uploaded file and errors.

TagmeClientAdvanced.update_dataset(dataset_id: str, name: str, access: str = 'organization', data_classification_level: str | None = None, data_source: str | None = None, organization_id: str | None = None, acl: List[str] | None = None, )Dataset

Update dataset information.

Parameters

* dataset_id – dataset id to update.

* name – name of new dataset.

* data_classification_level – data classification level (‘K2’, ‘K3’).

* data_source – data source.

* organization_id – optional organization identifier. Defaults to None.

* access – access level.

* acl – access control list.

Returns

information about updated dataset.

TagmeClientAdvanced.upload_dataset_files(dataset_id: str, filepaths: Sequence[str | Path], batch_size: int = 16, organization_id: str | None = None, tqdm_on: str | bool = False, )UploadFilesResult

Upload selected files to dataset.

Parameters

* dataset_id – id of dataset to upload files to.

* filepaths – list of paths to files to upload.

* batch_size – size of batch used during uploading. Defaults to 16.

* organization_id – optional organization identifier. Defaults to None.

Returns

object which contains information about uploaded files and errors.

TagmeClientAdvanced.upload_folder_to_dataset(dataset_id: str, folder: str | Path, exclude_ext_filter: Set[str] | None = None, organization_id: str | None = None, )UploadFilesResult

Upload files from folder to dataset.

Parameters

* dataset_id – id of dataset to upload files to.

* folder – path to folder with files to upload.

* exclude_ext_filter – set of file extensions to exclude.

* organization_id – optional organization identifier. Defaults to None.

Returns

object which contains information about uploaded files and errors.